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External and Internal Stimulus Factors in 
Rorschach Performance 


James Bieri and Edward Blacker 


Harvard University * 


It is customary to conceive of the Ror- 
schach situation as one in which the subject 
(S) may respond to a variety of stimulus ele- 
ments which are available to him in the blots. 
The ordering of the stimuli or determinants 
available to S usually centers about the form 
quality of the blot. That is, the essential shape 
of the blot may constitute the only stimulus 
factor utilized in a given response. However, 
other stimulus factors may enter into the 
determination of the response. These include 
movement, color, and shading, and each may 
modify or supplant the form determinant. It 
is possible to construe these determinants as 
falling on a continuum of external-internal 
stimulus factors. Sarason (7), for example, 
has spoken of external and internal referents 
in Rorschach performance. Thus, color is an 
aspect of the blot that is physically present 
in the blot as a stimulus. In this respect it is 
a readily available external stimulus factor. 
On the other hand, movement represents an 
internal stimulus factor having no intrinsic 
representation in the blot material. It is, so 
to speak, a stimulus quality more proximal to 
S, one which is apparently more a function of 
some internal process rather than of the ex- 
ternal stimulus. 

Thus we may assume that two categories 
of Rorschach responses exist, those responses 
whose determinants represent external quali- 
ties of the stimulus material, and those re- 
sponses whose determinants are more a func- 
tion of S than of the physical stimulus mate- 
rial. In the former category would be placed 
the form, color, and shading responses, while 


1 This study was facilitated by a grant from the 
Laboratory of Social Relations, Harvard University. 


in the latter would be found the various types 
of movement responses. It would seem ap- 
propriate, therefore, to analyze and establish 
behavioral correlates of these two classes of 
conceptual responses. If these two classes of 
responses do indeed have different behavioral 
correlates, then their utilization in a broader 
clinical and personality framework would seem 
warranted. 

In the present study, the chief emphasis 
will be on the two representatives of these re- 
sponse categories discussed above which are 
utilized in Rorschach’s experience-type, i.e., 
human movement and color responses. Start- 
ing with Rorschach himself (6), the experi- 
ence-type has been a fundamental variable in 
Rorschach interpretation. Both human move- 
ment and color responses hold key positions 
because of their frequency of occurrence and 
of their assumed clinical significance. In ad- 
dition, there is an increasing body of research 
literature relevant to the personality and be- 
havioral characteristics associated with these 
two types of responses. Secondarily, this study 
will be concerned with two other kinds of re- 
sponses on the external-internal stimulus con- 
tinuum, i.e., shading and nonhuman move- 
ment responses, respectively. The behavioral 
characteristic which will be primarily em- 
ployed in investigating these two classes of 
responses will be that of reaction time. 

Investigating these two classes of responses 
by means of reaction time behavior can be 
understood when we consider the nature of 
the response categories discussed above. Color 
is an immediately available external stimulus, 
which may be utilized in giving a response. 
Movement, however, has in addition an inter- 
mediate level of response. The S perceives the 
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form quality of the blot, apparently rejects or 
ignores ‘ther external stimulus factors which 
may be present, and invokes instead an in- 
ternal modification of the stimulus such that 
movement is perceived in the blot. The inter- 
vention of this intermediate, more subjective 
process implies that, other things being equal, 
the time required for the movement response 
may be greater than that for the color re- 
sponse. Schematically, we may represent this 
state of affairs as follows: 


Say Color 
Stimulus C Response 
Sh 
iki Internal —s_— _Movement 
Stimulus Response 
Stimulus C 


Sh 


Several recent siudies have investigated 
this relationship originally discussed by Ror- 
schach, that is, that movement or kinesthetic 
responses represent an absence or delay of 
acting-out, overt motor behavior, while color 
responses imply a greater tendency to im- 
mediate action. Siipola and Taylor (10), us- 
ing achromatic blots, found a greater produc- 
tion of M responses under “free” conditions 
than under “pressure” conditions. Similar re- 
sults were obtained by Singer, Meltzoff, and 
Goldman (11). In their study, when Ss were 
forced to delay or inhibit responses, more M 
responses were produced. Other studies in this 
area have also obtained results suggesting a 
positive relationship between M production 
and inhibited motor behavior (5, 12). These 
studies (5, 8, 10, 11, 12) have been pri- 
marily concerned with movement responses 
rather than color responses. Thus, while the 
M:Sum C ratio implies a comparison of two 
response variables in predicting behavior, re- 
search has dealt chiefly with single variables, 
notably that of human movement. In the 
present study, primary emphasis is placed 
upon the comparison of experience-type pat- 
terns (introversive, extratensive, and ambi- 
equal) rather than upon the individual com- 
ponents of the ‘experience-type. 

With these considerations in mind, the fol- 
lowing problems were posed for investigation: 

1. Utilizing reaction time as a behavioral 
criterion, are there differences in the behavior 


of Ss depending upon the kind of experience 
balance produced? 

2. If there is a difference in the behavior of 
Ss with different experience types, is this dif- 
ference specific to responses of a certain type, 
or is it general to all responses to the blots? 

3. If behavioral differences are found in 
Problem 1 between Ss who emphasize re- 
sponses with internal and external stimulus 
factors, will the same sort of behavioral dif- 
ferences exist between Ss in relation to other 
Rorschach responses belonging to these two 
categories? 


Method 


Owing to the nature of the study, it was 
necessary to modify the Rorschach procedure 
so that a reaction time could be obtained for 
each response and to insure that each S re- 
acted to the identical blot material. To meet 
these demands, only a portion (a Rorschach 
D) of each of the standard blots was used. 
The portions of the blots utilized were se- 
lected according to the ease with which any 
given blot area could be given variable inter- 
pretations in terms of determinants and con- 
tent. Thus, each biot area selected was judged 
capable of eliciting at least two of the four 
major determinants, i.e., form, color, shading, 
and movement. In addition, the number of 
achromatic and chromatic blots was in the 
same proportion as in the regular series (five 
achromatic, five chromatic blots). The blot 
areas chosen, in order of the administration, 
are listed below with Beck’s number nota- 
tion (1). 


Card I: entire center “figure” area (D 4) 

Card II: upper left red (D 2) 

Card IV: lower center area (D 1) 

Card III: upper left red inverted (D 2) 

Card V: entire right half without “antenna” 
(D 4) 

Card VIII: bottom center pink and orange area in- 
verted (D 2) 

Card VI: entire left half of card without top pro- 
jection, card rotated 90 degrees to right 
(D 4) 

Card IX: left green figure, card rotated 90 degrees 
to right (D 1) 

Card VII: entire right half of card (D 9) 

Card X: upper left blue area (D 1) 


Each blot was prepared by making a tem- 
plate the same size as a Rorschach card of 
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white posterboard in which a slot was cut 
large enough to expose the desired blot por- 
tion. Each template was taped to its respec- 
tive card with the appropriate blot portion 
exposed. Thus, each blot area was exposed on 
a card the same size as the original Rorschach 
cards. 


Administration 


Each S was given the modified Rorschach 
individually by either of the two authors. The 
S was told that he was going to be shown a 
series of inkblots. He was instructed to look 
at the entire blot and to tell the examiner 
(EZ) what it looked like or what it could be. 
The S was again reminded to look at the 
whole blot. Following this, the first card was 
presented to S. A stop watch was held un- 
obtrusively by E, and the reaction time to 
each response was noted. As soon as S$ gave 
his initial response to the card, the card was 
removed and replaced with the next card in 
the series. The E handed each card to S in 
the position in which it was to be viewed. If 
S started to turn the card, E asked him to 
keep it in the original position. Attempts to 
turn the cards were surprisingly few, and by 
having S hold each card it was possible to 
approximate more closely actual Rorschach 
testing conditions. 

After S had given responses to the ten blots, 
he was then told that he would look at the 
blots again, and this time he was to tell E 
what else he saw, what else it might look like. 
Following this second series of the cards, the 
identical procedure was repeated once more. 
On each response of each of the three series 
the reaction time and content were recorded 
by E. At the end of the third series, an in- 
quiry was conducted card by card in the usual 
manner. Thus, a total of thirty responses was 
obtained for each S, three associations having 
been given to each of the ten blots. 

Owing to the nature of the modified Ror- 
schach used, some of the Ss repeated a re- 
sponse which they had given previously to 
the same blot. In these cases S would usu- 
ally say, “I can’t see anything different from 
last time,” or “It still iooks like a ———— to 
me.” Under these circumstances, E would re- 
cord the reaction time when the statement 


was made, and the response would be scored 
as originally given. 


Subjects 


A total of 40 paid male undergraduate Ss, 
primarily sophomores, was used in this study. 


Scoring 


All the Rorschach protocols were scored by 
one E, and ten of the protocols were selected 
at random and scored by the other £. Two 
measures of reliability were used. First, the 
percentage of agreement of scoring for each 
protocol was obtained. The average percent- 
age of agreement obtained for the ten records 
was 83. In addition, the number of responses 
in each determinant category was counted, 
and the scores for the two scorers were cor- 
related using the rank-order coefficient. These 
categories included M, FM, m, F, FC, CF, 
and Sh. The Sh category included all shading 
responses of the Klopfer type as well as the 
use of achromatic color (K, KF, FK, Fc, cF, 
c, FC’, C’F, and C’). The Sh category is simi- 
lar to that originally proposed by Hertz (4), 
except that in the present study the shading 
responses were not weighted but merely added 
to form a total Sh score. The average inter- 
rater reliability for these several scoring cate- 
gories was .85. 


Results 


On the basis of the Rorschach scores, each 
S was assigned to one of three experience- 
type groups. The introversive (M > Sum C) 
group consisted of the Ss who showed a pre- 
dominance of M over Sum C. This ratio was 
computed in the usual fashion with increas- 
ing weights of 
FC, CF, and C responses, respectively. In or- 
der to be placed into the M > Sum C group, 
S had to have an M score that was at least 
two higher than his Sum C score. A total of 
17 Ss met this criterion and were placed in 
the M > Sum C group. Similarly, 11 Ss were 
placed in the extratensive (Sum C > M) 
group because their Sum C scores were at 
least two higher than their M scores. The re- 
maining 12 Ss were placed in the ambiequal 
group inasmuch as their M and Sum C scores 
differed by less than two from each other. 


, 1 and 1™ being given to 
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Table 1 


Mean Rorschach Scores for M>Sum C Group (V=17), 
Sum C>M Group (N=11) and Ambiequal 
Group (N = 12), and for All Subjects 
Combined (N= 40) 











M> Sum C> Ambi- 
Score Sum C M equal Total 
M 7.9* 1.6 3.0 4.7 
FM 2.8 2.4 2.2 2.5 
m 1.1 1.5 0.8 1.2 
F 12.5 12.8 17.3* 14.0 
Sh 2.8 4.6 2.8 3.3 
FC 1.5 2.1 1.5 1.6 
CF 1.1 ng 2.4 2.6 
Sum ( 1.8 6.1* 3.3 3.5 








* Higher than mean values of other two groups on this 
variable at .01 level of significance (two tails). 


The mean scores for all Ss combined and for 
each group are presented in Table 1. The ratio 
of the mean M and mean Sum C scores for 
the M > Sum C group was 7.9:1.8, for the 
Sum C > M group 1.6:6.1, and for the ambi- 
equal group 3.0:3.3. The differences between 
the means of the M and Sum C scores for 
both the M > Sum C group and the Sum C 
> M group are significant below the .01 level. 
On the other hand, the difference between the 
means of the M and Sum C scores for the 
ambiequal group is not significant. All tests 
of significance between means in this study 
were made by the use of White’s ranking 
procedure (3). 

Table 2 presents the mean reaction time 
scores in seconds for each of the three groups. 
It will be noted from Table 2 that the mean 
reaction times for the M > Sum C group for 
Series 2 and 3, as well as for the total reac- 
tion times, were significantly higher (p = .05) 
than the corresponding mean reaction times 


Table 2 


Mean Reaction Time in Seconds for M>Sum C, 
Sum C>M, and Ambiequal Groups 











Series M>SumC SumC>M  Ambiequal 
Series 1 17.4 10.0 12.3 
Series 2 29.7* 12.4 19.3 
Series 3 35.4* 18.2 18.1 
Total 27.4* 13.6 16.6 





* Higher than corresponding values in Sum C > M group 
at the .05 level of significance (two tails). 


of the Sum C > M group. While the ambi- 
equal group has reaction times generally in- 
termediate between those of the M > Sum C 
group and the Sum C > M group, these mean 
reaction times of the ambiequal group were 
not significantly different from the mean re- 
action times of the other two groups. 

Thus, in terms of Problem 1 of the study, 
we may say that groups of Ss who differ in 
terms of experience balance also tend to differ 
in terms of their reaction time behavior. These 
differences are generally significant between 
the M > Sum C and Sum C > M groups, the 
latter having shorter reaction times than the 
former. The ambiequal group tends to have 
reaction times intermediate between the pre- 
dominantly movement and color producing 
groups. 

Next we turn to Problem 2 in order to de- 
termine whether these differences are general 
to the Ss’ behavior on the Rorschach or 
whether specific kinds of responses were ac- 
countable for the differences in reaction time 
behavior. Specifically, we will compare the 
mean reaction time of each S’s M responses 
with his mean reaction time on his CF re- 
sponses, the latter being selected because more 
CF responses were given by Ss than any other 
type of color responses. For the 33 Ss who 
gave both M and CF responses, only 16 had 
greater mean reaction times to M responses 
than to CF responses. Furthermore, if we 
consider the M > Sum C and Sum C>M 
groups separately, we find significant positive 
rank-order correlations between mean reac- 
tion time on M responses and mean reaction 
times on CF responses (rho = .71 and .62, 
respectively). These findings indicate that Ss’ 
reaction time behavior in this situation is a 
general phenomenon, and that M responses 
individually do not elicit longer reaction times 
than do individual CF responses. These find- 
ings are in agreement with other researches 
(10, 11) which demonstrate that the over-all 
response set to the Rorschach situation is re- 
lated to types of responses elicited. 

It should be noted that as Ss proceeded 
from one series to the next, the conceptual 
problem of giving a response became progres- 
sively more difficult. This is reflected in the 
general increase in mean reaction times from 
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series to series (Table 2). Thus, the differ- 
ence between mean reaction time on series 1 
and series 3 is significant at the .01 level for 
the M > Sum C group, at the .05 level for 
the Sum C > M group, but not significant for 
the ambiequal group. Because Ss would at 
times repeat responses due to this increased 
difficulty, it is important to rule out the pos- 
sibility that repetitions per se may have con- 
tributed to the differences found in reaction 
times between groups. That this is not the 
case is seen in the fact that there are no sta- 
tistically significant differences between the 
mean number of repetitions among the M > 
Sum C, Sum C > M and ambiequal groups. 
These mean values are 3.8, 2.7, and 6.3, re- 
spectively. In addition, the Pearson product- 
moment coefficient for all forty Ss between 
mean reaction time and number of repetitions 
is only .13. 

The third question raised was whether the 
differences in reaction time behavior obtained 
in Problem 1 also obtain in regard to other 
determinants in the same categories as the M 
and C responses. Therefore, we should ask 
whether the differences in reaction time be- 
havior which hold for M and C responses also 
hold for the other movement responses and 
for the shading responses we designate as Sh. 
Because of the relatively low frequency of 
both FM and m responses (Table 1), these 
two types of responses were combined into 
one group, namely (FM + m). 

Since (FM + m) responses belong to the 
internal stimulus factor group, we should ex- 
pect that (FM + m) responses would be posi- 
tively correlated with M responses and reac- 
tion time, and negatively correlated with color 
responses and shading responses. Conversely, 
we expect SA responses, since they belong to 


the external stimulus factor group, to be posi- 
tively correlated with color responses and 
negatively correlated with M and (FM + m) 
responses and with reaction time. To analyze 
these relationships, product-moment correla- 
tions were computed between these variables 
for all forty Ss. These results are presented 
in Table 3. 

Three of the six variables in Table 3 had 
scores which were transformed by means of 
the logarithmic transformation in order to 
normalize their distributions. These were M, 
Sum C, and mean reaction time. The other 
variables had distributions which were rela- 
tively normal when original measures were 
utilized. 

From Table 3 we see that our expectations 
regarding the (FM +m) and Sh variables 
are partially borne out by the correlations ob- 
tained. (FM + m) correlates .17 with M, .14 
with mean reaction time, and — .05 with Sum 
C. None of these correlations is significantly 
different from zero, and these results may be 
attributed to chance alone. As for the Sh 
variable, we find it correlates .27 with Sum C, 
— 23 with M, —.50 with mean reaction 
time, and — .37 with (FM + m). The latter 
two coefficients are significant at the .01 and 
OS levels respectively. The significant nega- 
tive correlation between (FM +m) and Sh 
suggests that we are indeed dealing with two 
opposite response tendencies relative to these 
two variables. Thus, Ss who give more of one 
will tend to give less of the other. It will be 
noted from Table 3 that the Shé variable is 
more negatively correlated with mean reaction 
time than is the Sum C variable. This sug- 
gests that the M:Sh ratio would be at least 
as efficient a predictor of reaction time as the 
M:Sum C ratio. Indeed, if we substitute each 


Table 3 


Intercorrelations of Rorschach Scores and Mean Reaction Time for All Subjects (V = 40 














Score M Sum C (FM + m) Sh Mrr 
M — 42°* —.47°* 17 —.23 35* 

Sum C —.25 —.05 .27 —.25 

PF — 35° — 38° — .04 

(FM + m) -.37* 14 
Sh - .50** 





* .OS level of significance (two tails). 
** 01 level of significance (two tails). 
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S's Sh score for his Sum C score in calculat- 
ing his experience-type, we find that 13 of the 
17 Ss in the M > Sum C group and eight of 
the 11 in the Sum C > M group still meet the 
experience-type criterion for their respective 
groups (,* corrected for continuity, p = .03). 
Grouping Ss according to the M:SA ratio in 
an analogous fashion to the M:Sum C group- 
ings, we find twenty Ss in the M > Sh group, 
12 in the Sk > M group, and eight in the 
ambiequal group. The mean total reaction 
times for these groups are 27.8, 11.4, and 
15.4, respectively. Comparison of these values 
with the corresponding M:Sum C values in 
Table 2 shows the very close similarity be- 
tween the mean total reaction times. Again, 
there is a significant difference (p = .01) be- 
tween the mean total reaction time scores of 
the M > Sh and the Sh > M groups. 


Discussion 


The results of this study are congruent with 
other studies discussed earlier in indicating 
that the time dimension is an important vari- 
able involved in the production of Rorschach 
responses. This seems to be the case par- 
ticularly for M and Sh responses. Although 
shorter reaction times are produced by the 
Sum C > M group, reference to Table 3 in- 
dicates that the negative correlation between 
Sum C and mean reaction time is not signifi- 
cant, whereas the positive correlation between 
M and mean reaction time is significant. Thus, 
the presence or absence of M responses may 
be more crucial to reaction time behavior than 
the presence or absence of color responses. 
However, the significant negative relationship 
between SA and reaction time supports the 
contention that short reaction time is asso- 
ciated with the fuller utilization of external 
stimulus factors in the blots whether these 
factors be color or shading. It may be that 
the side effects of color in the blots, as sug- 
gested in Siipola’s work on hue-form incon- 
gruency (9), and as observed clinically in so- 
called “color shock,” served to decrease the 
negative relationship between latency and the 
use of color. 

It should be pointed out that other studies 
have found some indications of an inverse 
relationship between movement and shading 


responses. Cox and Sarason (2) report that 
while high anxiety Ss gave more M, FM, and 
m, low anxiety Ss gave more Fc, cF, and c 
responses. Similarly, Singer, Meltzoff, and 
Goldman (11) found a decrement in shading 
responses with inhibited Ss who produced 
more M responses. We see these findings as 
consistent with our thesis that different con- 
ceptual processes are involved when internal 
and external stimulus factors are utilized in 
inkblot reactions. 

The ambiequal group provides several im- 
portant additions to the findings. From Table 
1, it will be observed that this was essentially 
a form response group, i.e., it produced sig- 
nificantly more F responses than the other 
two groups. Further, Table 2 indicates that 
the reaction times of this group more nearly 
approximated those of the Sum C > M group 
than of the M > Sum C group. Since we 
originally assumed form was an external 
stimulus factor, these findings would again be 
consistent with our underlying assumptions. 
However, several findings suggest that F re- 
sponses may be on a different continuum 
than the external-internal stimulus factor con- 
tinuum. Thus, if form is construed as an ex- 
ternal stimulus factor, we should expect it to 
correlate negatively with mean reaction time. 
The findings in Table 3 show this correlation 
to be only — .04. In addition, if form is an 
external stimulus factor, we should expect F 
would correlate more negatively with M and 
(FM +m) than with Sum C and Sh. How- 
ever, we find in Table 3 that F correlates 


about equally with Sh (r = — .38) and with 
(FM+m) (r=-— .35). Thus, form may 
represent a separate stimulus factor which 


can be combined with Rorschach determinants 
on the external-internal continuum. 

Intelligence appears to have played little 
role in producing the results of this study. 
There were no significant differences between 
the three groups on the vocabulary subscale 
of the Wechsler-Bellevue. None of the Ror- 
schach variables studied correlated signifi- 
cantly with intelligence as measured by 
vocabulary. In addition, this intelligence 
measure correlated only .16 with mean re- 
action time. 

The approach presented in this paper has 
emphasized the formal, conceptual properties 
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of inkblot reactions rather than the clinical 
or personality correlates of such reactions. It 
is our belief that both these approaches are 
of value in helping to formulate general prin- 
ciples of behavior from which predictions use- 
ful to the clinician can be made. With more 
understanding of the conceptual processes in- 
volved in reactions to blots, and of the be- 
haviors associated with these processes, we 
can proceed to develop a theoretical frame- 
work for Rorschach behavior which is both 
experimentally verifiable and clinically useful. 


Summary 


The assumption is made that all Rorschach 
responses can be conceptualized into two cate- 
gories, i.e., those responses whose determinants 
include only external stimulus factors (form, 
color, and shading) and those responses whose 
determinants include internal stimulus factors 
(movement). The primary purpose of the 
study was to determine if there was a differ- 
ence in the behavior of Ss who showed a 
predominance of one or the other of these 
two response modes. Due to the conceptual 
differences implied by these two types of re- 
sponses, it was decided that reaction time be- 
havior might provide a critical behavioral 
differentiation. In order to insure that each S 
had the opportunity to give as many different 
kinds of responses as possible, and in order 
to obtain reaction times for each response, 
a modification of the Rorschach was used. 
This modification consisted of ten selected 
details, one from each blot, which were shown 
to S in three consecutive series, always in the 
same order. Consequently, thirty responses 
were obtained from each of the forty Ss. 
Each S was placed in one of three groups 
depending upon whether his experience-type 
was introversive (M > Sum C), extratensive 
(Sum C > M), or ambiequal. The main find- 
ings were: 

1. The Ss in the M > Sum C group gener- 
ally had significantly longer reaction times to 
the blots than did Ss in the Sum C>M 
group. 

2. This difference was general to the Ss’ 
performance and was not a specific effect of 
longer reaction times in giving movement re- 
sponses or shorter reaction times in giving 
color responses. 


3. Other responses in the internal and ex- 
ternal stimulus factor categories were found 
to differentiate reaction time behavior in 
an analogous fashion. Thus, (FM + m) re- 
sponses correlated positively but insignifi- 
cantly with reaction time. Shading responses 
(Sh) correlated significantly in a negative di- 
rection with reaction time. Both (FM + m) 
and Sh responses correlated in the same di- 
rection with other Rorschach variables as did 
the M and Sum C responses. The M: Sh ratio 
proved to differentiate Ss’ total reaction time 
behavior as well as the M: Sum C ratio. These 
findings suggest, in conclusion, that the cate- 
gorization of Rorschach determinants into in- 
ternal and external stimulus factors is of value 
in understanding the underlying conceptual 
processes involved in inkblot reactions. 


Received June 14, 1955. 
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Evaluation of Assumptions Underlying Interpretation 
of Sentence Completion Tests’ 


David K. Trites 


School of Aviation Medicine, USAF, Randolph Field, Texas 


The present study was undertaken to evalu- 
ate two of the assumptions sometimes made 
in clinical, but more frequently in psychomet- 
ric, or objective, use of projective-type sen- 
tence completion tests. (@) Incomplete sen- 
tence stimulus items whose verbal content and 
meaning are immediately apparent and gen- 
erally agreed upon will elicit responses which 
refer to this generally agreed upon meaning. 
(6) Systems of response classification which 
represent the attitudes and objects referred 
to by both stimulus and response may be de- 
veloped by considering only the verbal con- 
tent of the stimuli. 

The evaluation of these assumptions was 
based upon a factor analysis of the tetra- 
choric intercorrelations of 74 of the items 
from an 88 item sentence completion test 
which had been administered as part of an 
experimental battery of adaptability screen- 
ing tests to aviation cadets and student offi- 
cers as they entered flight training. For the 
correlations, the responses of 392 cadets were 
classified dichotomously as indicating either 
a positive or negative attitude with reference 
to adjustment to the pilot training program. 
Independently, responses were classified into 
one of 13 scoring categories covering what 
seemed to be the psychologically relevant 
areas with respect to both stimulus items and 
adaptability criteria. 

1 An extended report of this study was published 
as Psychiatric screening of flying personnel: Evalua- 
tion of assumptions underlying interpretation of sen- 
tence completion tests, by David K. Trites. Air Uni- 
versity, USAF School of Aviation Medicine, No. 55- 
33. Randolph Field, Texas, March, 1955. It may be 
obtained without charge from David K. Trites, 


School of Aviation Medicine, USAF, Randolph Field, 
Texas. 


Application of the complete centroid method 
of factoring yielded four factors. Blind rota- 
tion produced a relatively well-defined simple 
structure. 

Factor I, called the Self-Centered Anxiety 
Factor, was defined by items containing a 
and items referring to 
bodily characteristics. Apparently the factor 
reflects a dimension not explicitly recognized 
when the scoring key was formulated. 

Factor Il, named the Air Force Motiva- 
tion Factor, had highest loadings on items 
eliciting responses referring to Air Force ac- 
tivities, and corresponded to one of the cate- 
gories of the scoring key. 

Factor II] 
best defined 


negative statement 


the Interpersonal Factor, was 
yy items representing an orienta- 
tion toward interpersonal objects and pleas- 
ures. The factor seemed to be a complex of 
two of the scoring categories representing in- 
terpersonal activities. 

Factor IV, called the Narcissistic (or Self- 
Enhancement) Factor, had its highest load- 
ings on items referring to parents, childhood, 
and most of the remaining items containing a 
personal pronoun which were not loaded on 
Factor I. Some elements of one of the most 
complex scoring categories were described by 
this factor. 

The extraction and interpretation of these 
four factors support the first assumption by 
defining a communality of interpretation of 
the item stimuli. The overlap of the four fac- 
tors with the categories of the scoring key is 


partial support for the second assumption. 


Brief Report 


Received October 25, 1955. 
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The Influence of Color on Reactions to 
Incomplete Figures 


Jacob Berg’ and C. J. Polyot 


University of Maine 


Throughout the history of the Rorschach 
test, no problem has received more attention 
than the clinically observed disturbances 
which some subjects display when viewing the 
chromatic inkblots (2, 3, 7). One of the more 
significant experimental efforts which deals 
with the influence of color on reactions to ink- 
blots is by Siipola (9) in which she disputed 
Rorschach’s assumption that color per se 
endowed an inkblot with “magic, affect- 
arousing properties.” Siipola postulated that 
affective phenomena such as associative block- 
ing, strong emotional reactions, and symptoms 
of conceptual and behavioral disorganization 
which occurred in some Ss when viewing a 
chromatic inkblot were probably due to the 
conceptual difficulty arising from “hue-form 
incongruity.” 

This difficulty is said to occur when the 
form of the configuration suggests a concept 
to which the color is not appropriate. Pri- 
marily, this incongruity creates a delay in 
response as indicated by the increased reac- 
tion time in responding to the chromatic blots 
as compared to an identical but achromatic 
set. At times, this incongruity also leads to 
significant changes in the content of responses. 
Since Siipola’s study has had considerable in- 
fluence on the theoretical re-evaluation of the 
role of color on reactions to inkblots (1, 2, 4, 
5, 7, 8), her findings are worth considering. 

Siipola cut out 20 colored portions of the 
original Rorschach test cards and photo- 
graphed them. The achromatic set was shown 
to 60 women while the chromatic set was shown 
to 72 women. Only the first conceptual response 
elicited spontaneously was used. The mean re- 


1Senior author is now associated with the Stu- 
dent Counseling Bureau, University of Minnesota. 


action time between the two groups for each 
card was compared. She found the chromatic 
cards required a longer mean reaction time 
than the achromatic set; only two exceptions 
occurred where the reverse was true. Unfor- 
tunately, no statistical test was computed to 
determine whether these differences might not 
be due to chance fluctuations. Since the hy- 
pothesis of hue-form incongruity leaned so 
heavily on this finding, mean latency of re- 
sponse scores—which are notoriously skewed 
—hardly suffices as an adequate support. In a 
recent study by Lazarus and Oldfield (5) 
which duplicated much of Siipola’s study, there 
was no clear-cut indication that the chromatic 
cards delayed the response significantly more 
than the achromatic mates. 

A question also may be raised regarding 
an interpretation Siipola placed on her data 
when the chromatic blots were found to be 
either more “pleasant” or “unpleasant” than 
the achromatic ones. She argued that such 
results indicated that colored blots were more 
“affect-laden” in the sense that they were 
more likely to arouse emotional attitudes dur- 
ing the process of responding (9, p. 366). 
Phillips and Smith (7) raised the question: if 
associative blocking and other disturbing emo- 
tional effects resulted from hue-form incon- 
gruity, how could Ss ever find such a process 
“pleasant”? Another interpretation is pos- 
sible; namely, that these affective states de- 
pended on whether the Ss were able or unable 
to give satisfactory responses to the cards. 
For example, with two cards equally saturated 
with a red hue, Siipola could not understand 
why one of these cards was found “pleasant,” 
the other “unpleasant.” Upon closer examina- 
tion of her data, it was found that the “un- 
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pleasant” card elicited four times as many 
rejected or deteriorated responses as did the 
“pleasant” card. Rather than utilize emo- 
tional attitudes during the process of respond- 
ing to explain these findings, it appears more 
likely that these attitudes resulted from suc- 
cess or failure to respond. Indeed, Lazarus 
and Oldfield (5) found precisely this to be 
true. 

One of the difficulties Siipola encountered 
was to explain why hue-form incongruity at 
times did not effect any differences in the 
content of the responses for the two groups. 
In resolving this problem, she was led to 
postulate that when a card is “highly struc- 
tured,” color exerts no influence since the 
form is too compelling. While this reasoning 
may be accused of being circular, some of her 
data contradict this notion in that so-called 
highly structured cards had unusually long 
reaction time scores. Why; should this be so? 
To argue that “Ss spent considerably more 
time searching in vain for a more congruent 
concept” at the same time you argue that 
highly structured forms are more compelling 
is to eat your pie and have it too. It is dif- 
ficult to measure card ambiguity using Ror- 
schach inkblots. Lazarus and Oldfield (5) 
tried to establish a criterion of nonambiguity 
by selecting those cards, regardless of whether 
they happened to be in a chromatic or achro- 
matic form, which elicited the same content 
in at least 77 per cent of the total responses 
obtained. We are not informed as to how 
many of the 36 cards met this criterion, nor 
why this cut-off point was chosen. 

In the present investigation, stimulus con- 
figurations were used which allowed a more 
varied yet thoroughly controlled use of hue, 
and at the same time provided for a clear 
determination of the degree of ambiguity of 
each card. The following hypotheses were 
then tested: 

1. The presence of hue in a configuration 
generally produces a significant delay in 
eliciting the first conceptual response. 

2. This delay in response is due to the 
incongruity that exists between the form and 
the hue of the figure perceived. 

3. Hue has a weak selective influence on 
the content of the response when the con- 
figurations are highly structured or nonam- 


biguous; but a strong, disruptive influence 
when the figures are poorly structured or 
ambiguous. 


Materials and Procedure 


The configurations used were taken from 
the Closure Test ? developed by C. M. Mooney 
of McGill University. This test is similar to 
the Street-Gestalt Test to the extent that the 
figures are incompletely drawn in black and 
white, and Ss are required to perceive the ob- 
ject depicted in the configuration under re- 
duced cues. Figure 1 contains the 12 cards 
selected from Mooney’s test for the present 
study. These cards were chosen because norms 
indicated they represented varying degrees of 
ambiguity or structuredness. The index of 
difficulty was based on the proportion of Ss 
able to identify each figure correctly in the 
total configuration in the absence of color. 
If more than 50 per cent of the Ss were able 
to do this, the figure was called highly struc- 
tured; otherwise, it was called ambiguous. 
(The index of difficulty for each card appears 
in Table 2.) 

Five sets of 12 figures each were used. 
Each set differed from the other sets in the 
hue applied to the white areas in each figure. 
Hues were chosen to represent the popular 
intervals of the color spectrum, and were red, 
green, yellow, and purple. The fifth set was 
achromatic. These 5 sets were identical in 
every respect except for the variation in hues 
from set to set. Each configuration measured 
4 inches by 6 inches and was mounted on a 
white cardboard background 7 inches by 10 
inches. 

Two hundred and fifty undergraduates of 
normal color vision, enrolled at the Univer- 
sity of Maine, were the subjects. Approxi- 
mately half were women. The Ss were ran- 
domly assigned to one of the five groups. 
Each S saw all 12 cards of a given hue. In this 
way, 50 Ss served in each of the 5 groups. 
Cards were presented in the upright position, 
one at a time, to each S. A sweep-second stop 
watch was started at the moment of card 


2 This test may be obtained for experimental pur- 
poses only from the Department of Psychology, Mc- 
Gill University. We wish to express our appreciation 
to this institution for providing us with several 
copies of this test. 
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Fig. 1. The stimulus figures used in the experiment. They are (1) turkey on a plate, (2) three shoes, (3) 
' pig or cow, (4) locomotive, (5) telephone, (6) male torso, (7) profile of man’s face with hand on chin, (8) 
figure seated on bench, (9) jockey on horse, (10) full face, (11) logger with pole, (12) couple or person 


holding child. 


presentation. It was stopped when S made 
' a definite, scorable response. The card was 
i then removed and the next one presented 
until all 12 cards had been seen. This pro- 
cedure was used with all Ss. Instructions 


were: 


You will be shown 12 incomplete pictures of 
common, everyday subjects or objects one at a 
time. They may be confusing when first seen be- 
cause some of the lines or shapes have been left out. 
However, as you go on looking at each card you 
should expect to see some popular object, figure, or 
subject matter. Now, some of these pictures are 
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Table 1 


Percentage of Ss in Each Group Offering Some Response At or Above the Median Reaction Time Score 











Per cent at/above Md R.T. 








Md —. — Chi 
Cards (sec.) Ad Ff R a G Square p 

Ambiguous 

1. Turkey 20.5 46 36 58 62 48 8.48 .05—.10 

2. Shoes 15.5 52 50 42 44 62 4.94 .20-.30 

3. Pig 20.5 42 46 54 52 56 2.72 .50-—.70 

6. Torso 20.0 40 46 64 52 50 6.31 .10-.20 

7. Profile 12.8 66 42 42 62 46 10.62 .02-.05 

9. Horse & Rider 12.4 48 48 54 42 60 3.74 .30-—.50 
Highly Structured 

4. Train 8.0 48 48 50 52 56 1.89 .70-.80 

5. Telephone 6.3 46 52 36 50 64 8.22 .05-.10 

8. Seated 14.1 46 50 56 46 36 4.30 .30-.50 

190. Full Face 6.1 50 48 46 50 58 2.10 .70-.80 

11. Logger 13.4 46 62 54 52 46 3.54 .30-.50 

12. Couple 7.0 42 60 42 50 62 7.31 .10—.20 








more difficult than others, and it’s possible you may 
fail to see what is depicted. First impressions are 
probably best, so as soon as you recognize it, call 
it out so I may record your response. Have you any 
questions ? 


A time limit of 30 seconds was allowed for 
each card. If S failed to respond in this time 
interval, the card was taken up and the next 
one presented. No comment was made as to 
the correctness of the response. 


Results and Discussion 


Reaction time. A statistical problem usually 
arises when one attempts to deal with latency 
of response data. In the first place, such data 
rarely are normally distributed; in the sec- 
ond place, there are always Ss who fail to 
make a response to some of the cards. The 
statistical technique uniquely suitable for 
this type of data is the nonparametric Median 
Test (6). Using this statistic to determine if 
hue alone influenced the response when con- 
tent is ignored, it was found that only Card 
7 differentiated the groups significantly (chi 
square = 10.62, p= .02 — .05). All other 
cards failed to do so, as Table 1 indicates. 
Further analysis revealed that the yellow 
and achromatic cards were found to be sig- 
nificantly different from purple, red, and 
green, in that the latter hues had the shorter 
latencies. After determining the median re- 





sponse time score for each card in each 
group, a Median Test was computed for all 
12 cards as a group to determine whether hue 
had any over-all effect. Chi square was found 
to be 6.44, df=4, p= .10 — .20. Similar 
treatment was accorded the cards when divided 
into the categories of ambiguous and highly 
structured, but no significant color influence 
could be established statistically. 

Since the above analyses were concerned 
with latency of response scores regardless of 
the conceptual response elicited, it was im- 
portant to analyze these latencies when the 
same correct conceptual response was given 
by the various Ss in the 5 groups. Such an 
analysis would control still further the in- 
fluence of color either as a disruptive or a 
facilitative force in bringing about a given 
response to the stimulus figures. Again, only 
Card 7 yielded a significant chi square at the 
.01 level of confidence. It was found that here 
the achromatic card had a significantly longer 
latency than either purple or red. 

From these results, it is apparent that 
stimulus hue generally did not delay responses 
as the first hypothesis postulated. In the two 
instances where significant differences were 
obtained, either hue facilitated the response, 
or two different hues had differentially sig- 
nificant effects. Since Card 7 is essentially an 
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ambiguous card, the second and third hy- 
potheses are not supported. 

Content of response. Siipola contended “if 
a stimulus form is highly structured, color 
will normally have no effect upon the content 
of the conceptual responses regardless of 
whether the hue happens to be congruent or 
incongruent with the form-favored responses” 
(9, p. 374). On the other hand, if a stimulus 
form should be ambiguous, color could pro- 
duce novel concepts which otherwise were 
absent in the achromatic norm. Accordingly, 
the prediction follows that no differences 
should occur among the groups in their ability 
to recognize the figure depicted when the 
forms are highly structured; whereas, a dif- 
ference should be manifested in the number 
of correct identifications when the forms are 
ambiguous. Since the achromatic cards were 
“less constraining on the associative process,” 
they should elicit a greater proportion of cor- 
rect identifications than the chromatic cards. 

In Table 2, the data regarding the per- 
centage of correct responses obtained for each 
card among the various groups are presented. 
A chi-square test was computed for each card 
separately, and the corresponding p values 
indicate the levels of significance attained. 
Five cards yielded chi-square values below 
the .05 level of confidence. They were cards 
2 (shoes), 5 (telephone), 7 (profile), 11 
(logger), and 12 (couple). Two of these cards 
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were ambiguous, while three were highly 
structured. Apparently, degree of structure 
had little to do with this determination. 
Purple shoes were identified more often than 
yellow or red shoes; a profile in red was recog- 
nized more frequently than one in yellow; a 
red telephone was easier to see than a green 
one; a red logger was more difficult to see 
than a yellow or achromatic one; the couple 
was seen more clearly in red or achromatic 
than in purple. No particular color played a 
consistently superior or inferior role. 

When the cards were analyzed as groups 
based on degree of structure, significant dif- 
ferences were not found either for the highly 
structured cards or the ambiguous cards. 
These findings do not support Siipola’s hy- 
pothesis regarding the effects of color on highly 
structured or ambiguous cards. 

In Rorschach practice, the clinician always 
is concerned with determining the relative 
influences of form and color in eliciting every 
conceptual response. In analyzing the data in 
this study for content, the responses obtained 
for each card by all Ss were listed and com- 
pared. In substance, despite the variations in 
color used, rarely did any color differentially 
influence the content of the response. While 
there was a variety of content elicited, the 
kinds of content found in the achromatic 
cards generally turned up in all the chromatic 
groups. In only one single instance did a 


Table 2 
Percentage of Ss in Each Group Who Were Able to Perceive the Correct Figure in Each Card 











Per cent correct responses 











Cards Ach. P R Y G Chi Square p 

Ambiguous 

1. Turkey 22 10 20 12 10 6.01 .10-.20 

2. Shoes 26 36 16 14 20 10.98 .02-.05 

3. Pig 42 44 28 28 38 5.04 .20-.30 

6. Torso 50 40 36 40 52 4.00 .30-.50 

7. Profile 32 40 48 20 38 9.48 05 

9. Horse & Rider 34 42 38 50 30 4.97 .20-.30 
Highly Structured 

4. Train 58 36 46 54 46 5.80 .20-.30 

5. Telephone 60 50 70 50 46 18.61 001 

8. Seated 56 54 38 50 36 6.86 .10-.20 

10. Full Face 80 72 82 76 74 1.92 .70-.80 

11. Logger 66 48 40 66 50 9.44 OS 

12. Couple 86 68 94 78 74 13.01 02-.03 
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Table 3 
The Percentage of Ss in Each Group who Failed to Offer Any Response to Each Card 








Per cent rejected responses 





Y G 





Cards Ach. P R Chi Square p 
Ambiguous 
1. Turkey 30 20 42 36 30 6.18 .10-.20 
2. Shoes 24 16 26 24 32 3.55 30-.50 
3. Pig 36 28 to 44 46 5.17 -20-.30 
6. Torso 24 36 ao 34 24 6.64 .10-.20 
7. Profile 34 18 16 30 22 5.28 .20-.30 
9. Horse & Rider 32 16 32 30 32 12.23 .01-.02 
Highly Structured 
4. Train 16 14 14 14 14 0.13 .98-.99 
5. Telephone it 12 10 24 20 5.06 .20-.30 
8. Seated 22 22 36 30 36 4.76 .30-.50 
10. Full Face 14 18 16 8 20 3.29 .50-.70 
11. Logger 16 18 26 24 20 1.87 .70-.80 
12. Couple 6 6 4 8 14 4.84 .30-.50 





given concept appear with significantly greater 
frequency in the chromatic version. On Card 
1, when seen in green, 16 Ss called it a “leaf” 
compared to 4 Ss in each of the other groups 
who saw it as a “leaf.” This was the only 
instance where hue pulled a conceptual re- 
sponse in its direction. 

It is important to note that while the ma- 
terials used in this experiment differed from 
Siipola’s blots, and the Ss were told that a 
definite figure could be seen on each card, 
the essential task confronting Ss in both ex- 
periments was equivalent. While Siipola in- 
structed her Ss that “this is a test of your 


imagination,” she also instructed them “. . . I 
would like to have you tell me exactly where 
and how you imagined what you did. . .” It 


is a mistake to think that the Rorschach test 
is a fantasy production in the sense that the 
reality features of the inkblots offer no con- 
straining rein on what Ss see. Even though 
the materials used in the present experiment 
are not inkblots, it was found that on the 
highly structured cards as many as 9 different 
concepts were given by Ss, while on the am- 
biguous cards the average for each card was 
11 different concepts. 

In the Siipola study, it was reported that 
the chromatic blots were rejected twice as 
often as the achromatic blots. In Table 3, no 
statistically significant differences among the 


groups as to the number of rejections elicited 
was found for the highly structured cards. 
With the ambiguous cards, only Card 9 
(horse and rider) differentiated the groups at 
the .01-.02 level of confidence. The purple 
hue contributed fewer rejections than the 
other hues, which in turn did not differ from 
each other. These data indicate that rejec- 
tions do not necessarily occur more frequently 
because hue happens to be present. 

In view of the results of this experiment, 
it is the writers’ opinion that with normally 
functioning Ss, form factors are far more 
significant in determining what is seen in a 
configuration than hue. 


Summary and Conclusions 


This study tested three hypotheses devel- 
oped by Siipola to explain the alleged dis- 
turbance observed in some Ss clinically when 
viewing the chromatic Rorschach inkblots. 
These hypotheses are: 

1. The presence of hue in a configuration 
generally produced a significant delay in 
eliciting the first conceptual response. 

2. This delay is due to the incongruity 
which sometimes exists between the form =nd 
the hue of the figure. 

3. Hue has a weak, selective influence on 
the content of the response in highly struc- 








tured figures, but a strong disruptive one in 
ambiguous figures. 

Incomplete figures taken from Mooney’s 
Closure Test were used. This created the ad- 
vantage of using figures which varied in de- 
gree of structure or ambiguity and allowed 
for a more varied and controlled use of hue 
which could be compared with the achromatic 
version. Five sets of 12 figures each were 
shown individually to 250 Ss randomly as- 
signed to one of the 5 groups. Since all sets 
were identical except for the variation in the 
hue of the figures, color influences as an in- 
dependent variable could be measured. 

Analyses of the data led to the rejection of 
all three hypotheses of Siipola. No significant 
differences were found among the groups on 
latency of the first conceptual response elicited 
except for one card where the achromatic 
form had the longer latency as compared with 
red and purple. Hue-form incongruity could 
not explain these differences. Hue had no sig- 
nificant influence on the content of the re- 
sponse as a function of the degree of card 
ambiguity. Also, the chromatic cards were not 
rejected significantly in larger numbers than 
were the achromatic cards. 

In brief, the findings of this investigation 
contradict the conclusions reached by Siipola 





Influence of Color on Reactions to Incomplete Figures 15 


to explain the clinically observed disturbances 
of some Ss when viewing the chromatic Ror- 
schach inkblots. 


Received May 5, 1955. 
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Reliability of the Blacky Test’ 


Sol Charen 


Catholic University * 


In a recent investigation of the concept of 
regression (2) use of the multiple-choice ques- 
tions in the Blacky test by Blum gave equivo- 
cal results. The research involved the test- 
retest of tuberculous patients at a period rela- 
tively early in their illness and again four 
months later in the final stage of recovery. 
The Blacky was the only instrument, in a 
battery of fifteen paper-and-pencil tests and 
the Rorschach, which gave a conflicting pic- 
ture with patients either showing no evidence 
on it of improvement or definite evidence of 
regression on recovery. 

Either the Blacky questions tapped per- 
sonality levels not reached by the other tests 
or did not have good reliability. The first as- 
sumption can be challenged by results of the 
Rorschach which showed only minor group 
changes when patients recovered. Even if 
Rorschach and Blacky tests are assumed not 
to tap the same levels of personality the re- 
gression in subjects which was shown by the 
latter test should have been reflected by 
changes in defenses or in ego structure and 
these changes would be picked up by the 
Rorschach. 


1An extended report of this study may be ob- 
tained without charge from Sol Charen, 200 Rhode 
Island Avenue, N.E., Washington 2, D. C., or for a 
fee from the American Documentation Institute. Or- 
der Document No. 4740 from ADI Auxiliary Publi- 
cations Project, Photoduplication Service, Library of 
Congress, Washington 25, D. C., remitting in ad- 
vance $1.25 for microfilm or $1.25 for photocopies. 
Make checks payable to Chief, Photoduplication 
Service, Library of Congress. 

2Now at the Montgomery County Mental Hy- 
giene Clinic, Rockville, Maryland. 


Measurement of reliability of the Blacky 
was possible in view of the test-retest situa- 
tion and evidence from the other tests that 
no basic personality changes occurred during 
the four month period of recovery. Scoring 
of the multiple questions accompanying each 
Blacky cartoon was by Blum’s criteria (1) 
with each subject scored “weak” or “strong” 
in various psychosexual areas. With this di- 
chotomy of results a point distribution can 
be assumed and therefore 7, was the correla- 
tion coefficient used to determine reliability 
(3, p. 179). 

The highest r, obtained was .519 for Cas- 
tration Anxiety (Picture IV) with the remain- 
ing ones low or negative. Allowing for the 
relatively lower values of r, the test-retest 
correlations are still low enough to suggest 
that one reason for the equivocal results ob- 
tained with the Blacky multiple-choice ques- 
tions in this study of regression in tubercu- 
lous patients might be due to the factor of 
low reliability of this test. 


Brief Report. 
Received October 17, 1955. 
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Discriminative Powers of Rorschach Determinants 
in Children Referred to a Child Guidance Clinic’ 
Kennison T. Bosquet 
Providence Child Guidance Clinic and Brown University 
— and Walter C. Stanley 
t Brown University 
a 
4 The present study sought to answer two as information concerning the discriminative 
, questions: (a) What is the relationship be- properties of Rorschach determinants. 
. tween the main Rorschach determinants and 
age in children, aged 7 through 13, who have Method 
- been referred to a child guidance clinic for The Rorschach protocols of 175 boys who 
n treatment? (5) How well do the Rorschach had been referred to the Providence Child 
a determinants discriminate between such re- Guidance Clinic for diagnosis and treatment 
y ferred children and the “normal” children were analyzed. Twenty-five Rorschach rec- 
: studied by Ames e¢ al. (1)? ords for each of the ages 7 through 13 were 
S- Though normative in character, our first selected at random from the Clinic files, con- 
n- question stemmed also from certain conven- taining Rorschach protocols which had been 
1e tions which are used in the interpretation of obtained and scored by the senior author as 
st children’s Rorschach records. These conven- part of the routine testing program of the 
st tions consist of statements about changes in Clinic. 
b- determinants as a function of age in normal The records were scored primarily accord- 
s- children. Yet the literature contains relatively ing to the Beck (2) system, with certain 
4- few normative studies, and those which have adaptations from Klopfer’s (5) method. The 
of been published tend to be inadequate, par- main differences from Beck were the use of 
ticularly for ages 7 through 13. Thus, the animal movement responses (FM), inanimate 
present study also raised a theoretical ques- movement responses (m) and Klopfer’s F%, 
tion: Do changes in the determinants, as- the percentage of form responses in the total 
sumed to take place in normal development, number of responses. All records were ana- 
occur in children referred to a clinic? These lyzed for the following 15 variables: R, M, 
i hypothesized changes may be stated as fol- FM, m, Total M, FC, CF, C, Sum C, Shad- 
-~ q lows: (a) there should be an increase in fre- ing, H%, A%, P, F%, and F + %. 
nn d quency with age in M, FC, F + %, P, and Intelligence ratings were available on 172 
51. | R; and (5) there should be a decrease in fre- of the 175 cases. The ratings were mostly 
u- 3 quency with age in FM, CF, C, and A%. Children’s Wechsler IQ’s; a few were Binet 
is- Our interest in the second major question IQ’s. The mean IQ was 98.55; the sigma, 
. was also twofold. By comparing the clinic 17.98. The distribution was tested for skew- 
x: ‘ 





I A 


children with those studied by Ames, we 
would have some basis for evaluating the 
generality of our findings on age as well 


1 This report is based on an MS. thesis submitted 
to Brown University by the first author. The second 
author served as advisor. 


ness and found to be not significant. Thus 
the sample used in this study is representa- 
tive of the general population in terms of 
tested intelligence. 

One hundred fifty-three of the fathers of 
children in this study were alive. Chi-square 
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Medians and Quartiles of Scores on Major Rorschach Variables 


Table 1 























Age 
Variable ile 7 8 9 10 11 12 13 
M 75 2 2 4 3 2 2 2.5 
50 1 1 2 2 1 1 1 
25 0 0 1 0 0 0 0 
FM 75 2 3.5 4 4 3 6 3 
50 1 2 3 2 2 2 2 
25 0 1 1 0 1 1 1 
m 75 0 1 1 0 5 0 1 
50 0 0 0 0 0 0 0 
25 0 0 0 0 0 0 0 
Total M 75 4 7 & 7.5 6 7 6.5 
50 2 3 5 4 3 3 4 
25 1 1 2 1.5 1.5 2 2 
FC 75 5 1 i 1 1 2 5 
50 0 0 0 0 0 1 0 
25 0 0 0 0 0 0 0 
CF 75 2.5 2 2 2 1 1.5 1 
50 1 1 1 1 0 0 1 
25 1 0 0 0 0 0 0 
Cc 75 0 0 0 0 0 0 0 
50 0 0 0 0 0 0 0 
25 0 0 0 0 0 0 0 
Sum C 75 3 3 3.5 2 2 2.25 2 
50 2 2 1.5 1 5 1 1 
25 1 75 25 5 0 5 0 
Shading 75 2 2 2.5 2 1.5 2 2 
50 1 1 2 1 1 1 1 
25 0 0 0 0 0 1 0 
H% 75 27 25.5 25 19.5 19 19 22.5 
50 13 8 14 13 15 11 15 
25 7.5 0 8 8 7.5 7 7 
A%* 75 64.5 61.5 67.5 67.5 71.5 70 75 
50 50 52 50 62 62 60 63 
25 36 43 43.5 46 44.5 55.5 54.5 
Pp 75 4 3.5 6.5 6.5 8 7 7 
50 3 3 5 5 5 5 5 
25 2 2 3 3 3.5 3 3.5 
F% 75 73 65.5 68 71 74.5 70 78 
50 64 58 54 59 61 52 58 
25 46.5 38.5 32 42 54 44 52.5 
F+%** 75 75 74 84 88 89 85 88 
50 63 65 75 80 80 71 79 
25 55 46.5 62 65 60.5 67 66 
R 75 17 20.5 25 23.5 21 23 22 
50 15 16 18 17 18 17 16 
25 11 11.5 13.5 13.5 13 12 10.5 








* Chi-square median test » value between .02 and .01. 


** Chi-square median test » value less than .01. 
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tests revealed that they did not differ sig- 
nificantly in socioeconomic status from em- 
ployed males in the state of Rhode Island, 
as listed in the 1950 census reports (7). 


Results 


Since few of the distributions of Rorschach 
scores were normal within age groups, chi- 
square tests of independence were used to 
test null hypotheses. The over-all median of 
the 175 scores for each variable was used to 
divide each age level into two groups, one be- 
low the median and one above. A 2 X 7 table 
was thus obtained for each variable. The hy- 
pothesis tested for each variable was that the 
age groups came from a population with a 
common median. 

Table 1 gives the median and quartiles for 
each age group on each Rorschach variable 
and summarizes the results of these chi-square 
comparisons. Three variables achieved sta- 
tistical significance. These are P, F + %, and 
A%, all of which increased with age. 


Comparison with Ames’s Data 


The study of Ames et al. (1) was made on 


‘a group consisting primarily of children of 


the research group which had been followed 
for many years at the former Yale Clinic of 
Child Development. This group was supple- 
mented with children of similar intelligence 
and social class levels from the Connecticut 
cities of New Haven and Waterbury. Fifty 
children, 25 boys and 25 girls, were seen at 
each of 13 different age levels. The children 
were considered to be a “normal research 
group.” Only those in the age range 7 through 
10, a total of 200 children, were used for 
comparison with the Clinic sample of the 
present study, as these were the only ages in 
which the two groups overlapped. 

The distribution of intelligence ratings in 
the Ames group shows three-fourths of the 
children to be above the “average” category, 
with the median and modal ratings being 
“superior.” The Clinic children, on the other 
hand, show a fairly normal distribution with 
the median and mode within the “average” 
category. The two groups differ significantly 
in intelligence, the chi-square p value being 
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less than .01, with the Ames group being 
much higher in rating. 

The Ames group was a “high-level popu- 
lation with more than half the children in the 
professional group, and three-fourths in the 
professional and managerial groups” (1, p. 
24). Ratings were based on the Minnesota 
Scale of Paternal Occupations (4). The Clinic 
children rated on the same scale show only 
15 per cent in the professional and mana- 
gerial groups and about three-fourths in 
groups comprising clerical, retail business, 
skilled and unskilled trades. The hypothesis 
that the clinic and the Ames groups were 
drawn from the same socioeconomic popula- 
tion can be rejected, the chi-square p value 
being less than .01. 

Ames’s data on Rorschach variables in- 
clude both boys and girls, while data of the 
present study are limited to boys. However, 
Ames comments, “Sex differences are mini- 
mal at most of the individual ages studied 
and are not for the most part consistent from 
age to age” (1, p. 288). Accordingly, the 
Clinic and Ames groups were compared on 
the 10 Rorschach variables where scoring 
was comparable by means of a chi-square 
analysis. The 10 variables were M, FM, FC, 
CF, C, Sum C, H%, A%, F%, and R. 

The hypothesis tested at each age level for 
each varfable was that dividing the Clinic 
and Ames age groups on the basis of the 
Ames age group medians yields frequencies 
independent of the Clinic-Ames classification. 
The 10 variables compared at the four ages 
gave 40 chi squares, of which three had p 
values between .02 and .01, and 37 had p 
values of .20 or more. For the three “signifi- 
cant” differences, the Clinic group had more 
FM responses at age 9; a lower F% at age 
9, and a higher A% at age 10. Inspection of 
Wilkinson’s (6) tabled values for obtaining 
p values of .0S and .01 in a set of four in- 
dependent tests reveals that these three chi 
squares are indicative of “real” p values of 
approximately .05. However, it should be 
mentioned that in the case of F% and A%, 
the direction of difference between the Clinic 
and the Ames age groups was not the same 
for the four age comparisons, thus prevent- 
ing any simple interpretation of even these 
“corrected” p values. 
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A second set of chi-square tests of inde- 
pendence of the Clinic-Ames classifications 
was based on the number of children using a 
determinant one or more times. In this com- 
parison only the Ames data for boys were 
compared with the Clinic group since the 
Ames data were broken down according to 
sex. However, the number of variables in- 
volved was smaller, as only those so listed 
and scored comparably by Ames could be 
used. The variables were M, FM, FC, CF, 
and C. 

These five variables compared at four dif- 
ferent age levels gave 20 chi square, of which 
one (that for C, Age 7) had a p value of 
about .02, and 19 had p values greater than 
OS. The chi square for C, Age 7 is not re- 
liable as one such value in a set of four. 

In a further comparison based on children 
using C one or more times, the four age 
groups were lumped and the Clinic boys were 
compared with the Ames boys in a single 
2 x 2 table. This comparison yielded a chi 
square with a p value of about .02, the Clinic 
group having the higher scores. 

Discussion 

It is evident from the findings with the 
Clinic children that the 15 Rorschach vari- 
ables analyzed showed a marked insensitivity 
to the age variable. Significant p values ob- 
tained only for P, F + %, and A%, each of 
which showed increases with age. It should be 
noted, however, that the first two of these 
variables are statistical in derivation and are 
in this study based on adult norms. 

The third significant increase (A%) is just 
opposite to the conventional assumption in 
regard to “normal” children, and might be 
explainable by the nature of our sample— 
referred children. However, when this clinic 
group was compared with the Ames sample, 
there was no significant over-all difference. 
This suggests that even with “normals,” A% 
may not decrease as commonly assumed 
throughout the ages studied. 

With regard to the nine conventional as- 
sumptions concerning the effect of age on de- 
terminants which were listed in the introduc- 
tion, the data on the Clinic group affirm the 
applicability of two (that P and F+% 





should increase), negate the applicability of 
one (that A% should decrease), and provide 
neither confirmation nor negation for the re- 
maining six conventional assumptions. 
Again, one way to treat these findings 
would be to dismiss them on the ground that 
our subjects came from a Clinic population, 
hence were abnormal in personality develop- 
ment. An alternative explanation, however, is 
suggested by the striking similarity in Ror- 
schach scores between the Clinic group and 
Ames’s “normal” research group. These two 
groups, it will be recalled, differed markedly 
in intelligence, in socioeconomic status, and 
in the fact of referral to a child guidance 
clinic, yet the only indication of a real over- 
all difference came from one of the three 
analyses involving C. The alternative impli- 
cation of our findings, then, is that a simple 
“sign” approach to the interpretation of chil- 
dren’s Rorschach responses is relatively insen- 
sitive either to age or to differences between 
markedly contrasting groups of children. This 
conclusion is further buttressed by the fact 
that the mean value for F + % for 7-year- 
old Clinic children was 65.84. Bochner and 
Halpern (3) state that an F + % of 65 is to 
be expected in a child just reaching school 


age. 
Summary 


The Rorschach protocols of 175 referred 
boys, 25 at each age, 7 through 13, were ana- 
lyzed for changes in 15 Rorschach determi- 
nants. Only three, P, F + %, and A%, showed 
significant changes with age, all three increas- 
ing with age. Insofar as scoring procedure 
was comparable, scores on the Rorschach de- 
terminants for this Clinic sample were com- 
pared with the scores reported by Ames e¢ al. 
for a “normal research group.” Although the 
Clinic sample was significantly poorer in 
intelligence and socioeconomic status, its Ror- 
schach responding, taken as a whole, was 
strikingly similar to that of the Ames group. 
It was suggested that the major implication 
of the present study was that the “sign” 
approach to the interpretation of children’s 
Rorschach responding is a relatively insensi- 
tive procedure. 


Received May 9, 1955. 


Rorschach Determinants in Clinic Children 21 


References . Goodenough, Florence L., & Anderson, J. E. Ex- 
perimental child study. New York: Century, 
1. Ames, L. B., Learned, J.. Metraux, R. W., & 1931. 
Walker, R. N. Child Rorschach responses. . Klopfer, B., & Kelley, D. The Rorschach tech- 
New York: Hoeber, 1952. nique. Yonkers, N. Y.: World Book Co., 
2. Beck, S. J. Rorschach’s test. Vol. I. Basic proc- 1946. 


esses. (2nd Ed.) New York: Grune & Strat- . Wilkinson, B. A statistical consideration in psy- 
ton, 1949 chological research. Psychol. Bull., 1951, 48, 


156-158. 
3. Bochner, Ruth, & Halpern, Florence. The clinical . U. S. Census of Population, 1950. Report No. 


application of the Rorschach Test. New York: P-B39, Vol. II, Part 39, Ch. B, RI. U. S. 
Grune & Stratton, 1945. Government Printing Office, 1952. 








Journal of Consulting Psychology 
Vol. 20, No. 1, 1956 





Word Association Frequency Tables of 
Mentally Retarded Children’ 


Edmund M. Horan 
New York, N.Y. 


No adequate word association frequency 
tables have been available for the measure- 
ment of group contact in children. The chil- 
dren’s tables published in 1916 by Woodrow 
and Lowell reported the responses of children 
as a group with no separation into age groups. 
Moreover, the method employed in their in- 
vestigation was not oral but written. 

The oral responses of 732 mentally re- 
tarded children, or a total of 73,200 responses, 
are reported in this study. The frequency 
tables, based on the Kent-Rosanoff stimulus 
set of 100 words, have been arranged by 
chronological age, nine through fourteen, in- 
clusive. The children were selected from the 
class personnel sheets at the Bureau for Chil- 
dren with Retarded Mental Development in 
New York City. The IQ’s of these children, 
based on individual examinations by psy- 
chologists from the Bureau of Child Guidance, 
ranged from 40 to 75. The median IQ was 
69. The children were all native born, with 
the exception of 13 children born abroad. 
Classes for children with retarded mental de- 
velopment in 97 New York City public schools 
took part in this investigation administered 
during the spring of 1954. 

In comparing agreement of favorite re- 
sponses between children of average intelli- 
gence (Woodrow and Lowell—written re- 


1An extended report of this study may be ob- 
tained without charge from Edmund M. Horan, 529 
West 111 Street, New York 25, N. Y., or for a fee 
from the American Documentation Institute. Order 
Document No. 4673 from ADI Auxiliary Publica- 
tions Project, Photoduplication Service, Library of 
Congress, Washington 25, D. C., remitting in ad- 
vance $1.75 for microfilm or $2.50 for photocopies. 
Make checks payable to Chief, Photoduplication 
Service, Library of Congress. 


sponses) and adults (Kent and Rosanoff 
—oral responses), and between the retarded 
children of this study and adults, the favorite 
responses of the retarded children overlapped 
more with those of the adults. In this com- 
parison, the tenth and twelfth years of the 
retarded children seemed critical because of 
the significant increase made in the number 
of their favorite responses toward the adult 
level. 

The present investigation found a much 
closer similarity between children’s associa- 
tions and those of adults than did Wood- 
row and Lowell. Differences do exist but the 
differences are not found in the manner of 
associating, but, rather, in the tendency of 
children to offer a greater number of unique 
responses, to use more different words in re- 
sponse to a stimulus and, at times, to be 
unable to respond to a stimulus word. 

Word meanings play a part in the forma- 
tion of associations, but not as important a 
part as has been thought. Responses by op- 
posites appear as favorite responses at a 
much younger age than has been believed. 
High written-word frequency does not ap- 
pear to give a stimulus word any special 
power in word association. 

A comparison of the responses of each age 
group of retarded children demonstrates a 
uniform decrease from nine through fourteen, 
inclusive, in the number of failures of re- 
sponse and unique responses, and a concur- 
rent increase in the number of common re- 
sponses, i.e., responses with a frequency of 
two or more. 


Brief Report 
Received August 2, 1955. 
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Examiner Influence in a Testing Situation’ 


Thomas A. Wickes, Jr. 


Purdue University 


A number of investigators (1, 2, 3, 5, 6) 
have studied examiners and the effects ex- 
aminers have on the results they elicit from 
examinees. These studies have been done 
within different contexts and from different 
points of view but they all point up the fact 
that the examiner can, in various ways, af- 
fect the results they get from examinees. 
Nevertheless, some examiners tend to treat 
certain aspects of the testing situation as if 
they were unimportant and thus disregard 
their possible effects upon gathered data. The 
present study was designed to see whether or 
not test results could be modified by some 
aspects of the testing situation which are 
sometimes treated as if they were unimpor- 
tant. The experimental hypotheses tested 
were: (a) test results will be significantly 
modified by the perfunctory, verbal com- 
ments, “Good,” “Fine,” and “All right.” (0) 
test results will be significantly modified by 
the perfunctory, nonverbal actions of smiling, 
nodding the head, and leaning forward in 
the chair. 


Procedure 


Thirty-six male students ranging in age 
from 18 to 23 years were used in the study. 
These students were enrolled in introductory 
psychology classes at Purdue University. Stu- 
dents were placed in one of three groups, each 
group consisting of twelve subjects. Hypothe- 
sis a was tested in one group; hypothesis 5 
was tested in another group; and the third 
group served as a control. 


1 This paper is based on a thesis submitted in par- 
tial fulfillment of the requirements for the degree of 
Master of Science, to the Graduate School of Purdue 
University. The study was directed by Prof. A. W. 
Landfield. 
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Two examiners were used in the experi 
ment. Each examiner tested six subjects in 
every group. The author and the second ex 
aminer both had similar training and experi 
ence in psychology and in scoring inkblot re 
sponses via Klopfer’s method. Both examiners 
practiced voice inflections, posture, and move 
ments so that these conditions would be as 
nearly equivalent as possible in the experi 
ment. 

“Test results” in the experiment were de 
fined in terms of movement (4/) responses, 
ie., the inkblot induced perception of action 
in humans (excluding internal parts) and/or 
human-like action in animals or parts of ani 
mals. The test instrument was a set of 30 
achromatic inkblots, devised by the author 
and several of his colleagues, which had been 
selected by means of a pilot study from an 
original series of 120 blots. These 30 ink 
blots consisted of two series of 15 blots each 
numbered 1A, 2A, ...15A and 1B, 2B, 

1SB. 


Both A and B series of blots were composed of 
five blots which elicited movement responses be 
tween 45 per cent and 50 per cent of the time in the 
pilot study and ten blots which elicited movement 
responses between 20 per cent and 25 per cent of the 
time in the pilot study. Those five blots in each 
series which elicited the most movement were pre 
sented first. 

In the testing situation, the subject was conducted 
to a small well-lighted testing room and was asked 
to seat himself in a chair at the right side of the 
desk, adjacent to, and facing the examiner. After 
certain preliminaries had been completed (asking the 
subject to withhold questions, checking him off the 
appointment sheet) the instructions were read to him 
as follows: “I am going to show you some cards 
with ink blots on them. I want you to tell me what 
one thing it looks most like to you, or reminds you 
of. Please look at each card carefully, turning it so 
that each of the four sides is toward you.” At this 
point, the experimenter gave a demonstration in 
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which he picked up a blot and looked at it from 
each of the four sides. “Take all the time you wish, 
then tell me the clearest thing you see. When you 
finish, hand me the card and I will write down your 
answer.” 

After the subject indicated that he understood the 
instructions, cards 1A to 15A were presented to him, 
one at a time. After each response, the subject 
handed the card to the examiner who then recorded 
the response. During the presentation of the first 15 
cards, the examiner remained in one position as 
nearly as possible, moving only his right arm to 
handle the cards and to record the response. Sub- 
jects in all three groups were treated in as nearly 
identical a manner as possible during the adminis- 
tration of the first half of the thirty-card set. How- 
ever, during the second half of the thirty-card set 
(i.e., cards 1B to 15B) variation was introduced in 
two of the three groups. 


In the first experimental group or the verbal 
condition group the examiner made a com- 
ment after each M response, starting with 
card 1B. With the first M response the ex- 
aminer said, “Fine.” On the next M re- 
sponse the examiner said, “Good.” After the 
third M response the examiner said, “All 
right.” These comments were repeated in that 
order after each M response throughout the 
remainder of the series. No other comment 
was made and the examiner controlled his 
body movements except for tendering the 
card, recording the response, and receiving 
each card as it was returned. 

In the second experimental group or non- 
verbal condition group, the examiner modi- 
fied his behavior after each response judged 
to be M, starting with card 1B. After the 
first M response, the examiner nodded his 
head three times. On the next M response the 
examiner smiled. With the third M response, 
the examiner leaned forward in the chair right 
after the response, recorded it, and returned 
to the initial position. These behavioral modi- 


fications were repeated in that order after 
each M response throughout the remainder of 
the series. No other movement was made and 
the examiner made no comments unless asked 
a direct question and an answer was un- 
avoidable. 

In the control group, all thirty cards, 1A 
through 15B, were administered to each sub- 
ject in a manner as nearly identical as pos- 
sible. 

Immediately after the subject had re- 
sponded to the last card, the examiner asked 
him to fill out a short questionnaire. The pur- 
pose of this questionnaire was to get a rough 
estimate of the subjects’ attitudes toward the 
test and the experimenter. The subjects’ 
anonymity was safeguarded in this phase of 
the experiment. After the subject had filled 
out the questionnaire, the examiner inter- 
viewed him to determine whether or not he 
had been aware of the nature of the experi- 
ment. It may be inferred from the interview 
data that none of the subjects was cognizant 
of the nature of the experiment, at least at 
the verbal level. 


Results 


Prior to analyzing the data, the author ex- 
cluded four subjects who gave no M responses 
to the last 15 cards under the verbal and non- 
verbal conditions. The total number in each 
group was then eight in the verbal condi- 
tion group, eleven in the nonverbal condition 
group, and twelve in the control group. 

In analyzing the data, the procedure was 
to compare the mean number of M responses 
elicited by the first 15 cards to the mean 
number of M responses elicited by the second 
15 cards. This was done for all three groups. 
As recommended by McNemar (4, pp. 225- 


Table 1 


Difference Between Correlated Means ¢ Test for Verbal Condition, 
Nonverbal Condition, and Control Condition 














SD of distr. of Significance level 
diffs. between Computed 
Group N paired scores t #<.025 #<.005 
Verbal condition 8 3.04 2.90 2.36 3.50 
Nonverbal condition 11 1.25 7.47 2.23 3.17 


Control condition 


1.63 





71 2.20 3.11 
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Table 2 


Difference Between Uncorrelated Means ¢ Test for Examiners in Verbal Condition, 
Nonverbal Condition, and Control Condition 





Condition and No. 


examiner tested 








Verbal condition 


Examiner I 5 

Examiner II 3 
Nonverbal condition 

Examiner I 7 

Examiner IT 4 
Control condition 

Examiner I 6 

Examiner II 6 


Mean of M diff. 
score for each 


Significance 


Computed level 


examiner t i< .05 
4.40 , 
) 2.3 
1.67 1.4 , 
2.71 J > 
3 
3.00 S 2.2 
83 
7 £ 0 
1.67 46 : 








226), the test of significance used made al- 
lowance for the fact that the two sets of 
scores were not random with respect to each 
other. The appropriate statistic in this case 
was the ¢ test of difference between correlated 
means. Table 1 gives the results of this ¢ test. 

As can be seen from Table 1, the differ- 
ence between the mean number of M re- 
sponses seen in the first 15 cards and the 
mean number of M responses seen in the 
second 15 cards in the verbal condition group 
was significant beyond the .025 level. The 
difference between the mean number of M re- 
sponses seen in the first 15 and the last 15 
cards in the nonverbal condition group was 
significant beyond the .005 level. There was 
no significant difference between the first and 
last halves of the cards in the control condi- 
tion group. 

To gauge the effect the examiners, per se, 
might have had in eliciting M responses from 
the subjects, a ¢ test was executed between 
the examiners. Essentially, this was a ¢ test 
between uncorrelated means such as is out- 
lined in McNemar (4, Ch. 12). It involved 
testing the hypothesis that there was no sig- 
nificant difference between the means of the 
M difference scores obtained from the vari- 
ous subjects tested by each examiner in each 
group. Table 2 gives the results of this ¢ test. 

Upon examination of Table 2, it can be 
seen that there was no significant difference 
between the examiners in any of the three 
groups. That is, one examiner did not seem 


to draw more M responses from his subjects 
than the other examiner. 

The questionnaire or opinion sheet, as in- 
dicated previously, was designed to gain only 
a rough estimate of the subjects’ motivational 
involvement in the testing situation. The re- 
sults of the questionnaire indicated that ap- 
proximately 90 per cent of the subjects in 
both the experimental and the control groups 
had a high degree of interest in both the ex- 
perimenter and the experiment. It was felt 
that these conditions led to a set toward, or 
an involvement in the situation conducive to 
optimal results. 


Discussion 


The hypothesis that test results will be in- 
fluenced by what the examiner says and does 
more or less routinely in the testing situation 
seems to have been substantiated, at least in 
terms of this experiment. In interpreting these 
results, it should be kept in mind that the 
samples used were small, but that the differ- 
ences obtained in these small samples had to 
be proportionately larger before the chosen 
significance level could be reached. 

Since the assumptions underlying the sta- 
tistics used are difficult to demonstrate, it 
does not seem entirely legitimate to assume 
without further study that given a different 
population and/or different tests, the results 
would be the same. However, using the pres- 
ent study as a predictive basis, it is indicated 
that examiners need to be more careful of 
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what they say and do, in a perfunctory man- 
ner, when they are engaged in testing. At 
least, they should be cognizant that even un- 
der what would be assumed to be “stand- 
ardized” conditions, it is possible for their 
behavior to be reflected in the test results. 


Summary and Conclusions 


This experiment was designed to test the 
general hypothesis that test results will be 
modified by those aspects of the testing situa- 
tion which are sometimes not carefully con- 
trolled or are treated as if they were unim- 
portant. Thirty-six male subjects in two ex- 
perimental groups and one control group were 
used to study the effects of perfunctory verbal 
comments and nonverbal actions on test re- 
sults. The findings of the study suggest that 
such comments as “Good” or “Fine” and such 
actions as smiling and nodding by examiners 
have a decided effect upon test results. Thus 
it was indicated that examiners should be 
alert to the fact that even under presumably 
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“standardized” conditions, it is possible for 
their behavior to be reflected in test results. 


Received May 13, 1955. 
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The Relationship Between the Inferential Potential 
of Rorschach and TAT Protocols 


Leon H. Levy, Janice R. Brody, and Georgia O. Windman 


Indiana University 


A problem frequently faced by the clinician 
in diagnostic work is that of deciding when 
to stop testing and when to continue. This is 
particularly true in those instances where the 
client has produced one test protocol which 
might be said to be low in inferential poten- 
tial, i.e., a protocol which does not seem to 
lend itself to the derivation of meaningful 
clinical hypotheses. The problem investigated 
in this article might be stated in the form of 
the question: Given a test protocol of a cer- 
tain relative level of inferential potential by 
a particular S, to what extent might one ex- 
pect a second and different test protocol to be 
of the same relative level? Stated somewhat 
more elegantly, we are interested in the extent 
to which the variance in inferential potential 
might be accounted for by the individual sub- 
ject (S) and by the subject-test interaction. 

If it were found that the individual S ac- 
counted for a major portion of the variance, 
i.e., if one could predict inferential level from 
one test protocol to the next for any par- 
ticular S, then there would be some basis for 
deciding whether to continue or discontinue 
testing on the basis of tests already adminis- 
tered. If, for example, the S produced a rather 
sterile Rorschach, he might be labeled “de- 
fensive” and sent on his way, or it might be 
decided that the TAT would be the method 
of choice with him. If, on the other hand, it 
were found that the subject-test interaction 
accounted for the major portion of the vari- 
ance then there would be no way of predict- 
ing inferential potential from one test to the 
next, and one would be justified in trying 
several tests until one was found which per- 
mitted an adequate formulation of the case. 
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Method 


The Rorschach and TAT were the two 
tests used in this study. Twelve cases were 
pulled from the clinic files which met the 
following criteria: (a) each folder contained 
a Rorschach and a TAT administered by the 
same examiner at the same session, and (5) 
the same examiner administered the tests in 
all of the cases. In addition, only male col- 
lege students were used. In this way it was 
hoped to minimize any effects of examiner 
differences, changes in attitudes, range of tal- 
ent, etc. The cases were selected from the 
files by the clinic secretary who was unaware 
of the purpose of the study. All Rorschachs 
and TAT’s were then pulled from the case 
folders and coded. 

Because of the complexity of the criterion, 
it was decided to use the paired-comparisons 
method in arriving at rankings of protocols 
on relative levels of inferential potential. Thus 
all twelve Rorschachs were ranked by means 
of paired comparisons and then correlated 
by means of rank-order correlation with the 
twelve TAT’s similarly ranked. Judges were 
instructed to indicate which protocol of each 
pair they felt would permit them to make a 
larger number of inferences about the per- 
sonality of the person producing the proto- 
col. In order to measure interjudge agreement 
three judges were used: Judge A ranked both 
Rorschachs and TAT’s, Judge B ranked only 
the Rorschachs, and Judge C ranked only the 
TAT’s. 


Results 


Rank-difference correlations were calculated 
in order to estimate interjudge agreement as 











28 Leon H. Levy, Janice R. Brody, and Georgia O. Windman 


well as the relationship between relative level 
of inferential potential on the two tests. In 
each case their statistical significance was 
tested by reference to Olds’s (1) tables of 
the distribution of sums of squares of rank 
differences. 

The value of rho for Judges A and B on 
Rorschach rankings was .94 while the rho be- 
tween Judges A and C on TAT rankings was 
.92. Both of these are significant at the .01 
level. Because of the apparently high level of 
agreement between judges, the ranking of 
Rorschachs and TAT’s on relative level of 
inferential potential was based on the com- 
bined rankings of the two judges for each test. 
Rho between Rorschach and TAT rankings 
was .65 which is significant at the .02 level. 


Discussion 


The data seem to indicate that the judges 
were in agreement with each other in their use 
of the criterion of inferential potential. This 
of course does not imply that had they each 
attempted to derive inferences from the ma- 
terial there would necessarily have been any 
substantial agreement between them either in 
the content or the number of these inferences. 

Our findings would seem to support the hy- 
pothesis, for these two tests at least, that the 
individual S accounts for a major portion of 
the variance in inferential potential and, con- 
sequently, that one should be able to predict 
relative level of inferential potential from one 
test to another for any given S. 

To the extent that these findings may be 
generalized to other populations, their prac- 
tical implications are fairly obvious, viz., that 
where one has obtained a Rorschach (TAT) 
which is fairly barren in diagnostic signifi- 
cance, there is little point in administering a 
TAT (Rorschach) in the hope that perhaps 
this test will do the trick. They fail to sup- 
port the assumption sometimes made that the 
more tests we have on the individual the 
more we are likely to learn about him. 





Theoretically, these findings raise several 
interesting issues. For one thing, it may well 
be that one of the most diagnostically signifi- 
cant findings in a particular case is that the 
S produces a protocol either high or low in 
inferential potential, ie., we may be dealing 
here with an important personality variable. 
Research directed along these lines attempt- 
ing to determine the behavioral correlates of 
this variable may go a long way toward en- 
hancing our understanding of personality and 
reducing the clinician’s feelings of inadequacy 
when he comes up against a client who 
“doesn’t give.” 

Another issue is the economic one. The cli- 
nician faced with the task of evaluating a 
client usually has only a limited amount of 
time in which to accomplish this objective. 
How can this time be spent most profitably? 
Our findings provide no answer to this ques- 
tion other than a negative one. It would seem 
though that some criterion, undoubtedly much 
more elaborate than that used in this study, 
might be employed so that the clinician could 
decide when he has reached the point of di- 
minishing returns with a client. 


Summary 


Judges ranked 12 Rorschach and TAT pro- 
tocols on the basis of their relative inferential 
potential. A significant relationship was found 
between the two tests and this was interpreted 
as indicating that the individual S accounts 
for a major portion of the variance in infer- 
ential potential. Certain practical and theo- 
retical implications of these findings were dis- 
cussed. 


Received May 20, 1955. 
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Dependency Themes on the TAT and 
Group Conformity’ 


Jerome Kagan and Paul H. Mussen 
The Ohio State University 


Many personality theorists use the concept 
of “dependence” in explanatory formulations 
and clinicians often make statements about 
the strength or intensity of an individual’s 
dependent needs. Although the specific be- 
havioral referents for this concept have not 
been explicitly or unambiguously defined, 
there is probably general agreement that de- 
pendent tendencies include “the need for 
emotional or authoritative support in most 
situations, difficulty in making independent 
decisions and taking on responsibilities and 
the dread of loneliness” (5, p. 392). 

This statement implies that the dependent 
person perceives others as more competent 
than he, and he is likely to seek and accept 
the guidance and advice of others when he 
has to make a decision. Since the dependent 
individual would regard the opinion of a 
group as wiser and more reliable than his 
own, it might be predicted that he would con- 
form with group judgments. Thus, if the 
tendency to conform to group opinion is ac- 
cepted as one behavioral referent for depend- 
ency, clinical evaluations of this character- 
istic could be validated by correlating the 
clinical measures with group conformity be- 
havior. 

The Thematic Apperception Test is often 
used as a technique for evaluating the differ- 
ential strengths of various motives and there 
have been numerous attempts to relate the 
needs revealed in TAT stories to overt be- 
havior. 

Apparently, the relationship between fan- 
tasy needs and corresponding overt behavior 
is better for some motives than for others. 


1The authors wish to acknowledge the assistance 
of Mr. John Anderson in the execution of this study. 
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Murray (6), studying a group of college men, 
obtained correlations of over + 40 between 
TAT fantasies involving nurturance and pas- 
sivity and over: behavior ratings of these 
variables but found correlations of near zero 
between sexual and aggressive needs and their 
corresponding overt behaviors. 

Sanford (9) and Tompkins (10) have sug- 
gested that if the motive in question is pro- 
hibited or punished in the individual’s social 
milieu, e.g., aggression or sex in the middle 
class, there is apt to be little or no relation- 
ship between the strength of the fantasy need 
and the corresponding overt behavior. Thus, 
Bach (2), Korner (4), Pittluck (8), and San- 
ford (9) did not find direct relationships be- 
tween the amount of aggressive TAT or doll 
play fantasy and the strength of overt ag- 
gression among children or adults. 

On the other hand, if the motive in ques- 


tion is culturally sanctioned, there is more 
likely to be a positive relationship between 
fantasy and overt expression. Thus, Mussen 


and Naylor (7) found a positive relationship 
between TAT fantasy aggression and overt 
aggressive behavior among boys of the lower 
class, for whom such behavior is often ap- 
proved. 

Dependent behavior, also, has some cul- 
tural sanction, for seeking advice or help in a 
problem situation is often encouraged and 
praised. For this reason one might predict a 
more direct relationship between dependent 
fantasy themes and overt dependent behav- 
ior. The present research was designed to test 
the hypothesis that there is a positive asso- 
ciation between TAT dependency themes and 
the tendency to adopt objectively inaccurate 
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group judgments in a situation where there is 
strong social pressure favoring conformity. 


Method 


The subjects were 27 male undergraduates 
enrolled in the elementary psychology course 
at The Ohio State University. Their ages 
ranged from 17 to 31 years with a mean of 
21.6. 

In the first part of the experiment, each 
subject was instructed to write stories in re- 
sponse to eight TAT cards from the Murray 
series in a prescribed order (1, 6BM, 7BM, 
3BM, 12M, 13B, 14, and 18BM). He was 
told to make each story about a half a page 
long but not to write for more than two or 
three minutes on each card. Previous research 
(3) indicates that there are no important 
differences between the thematic contents of 
written and oral TAT stories. 

After he completed the TAT, the subject 
was asked to cooperate in a seven- or eight- 
minute experiment on vision, conducted by 
another experimenter who needed additional 
subjects. This second situation was identical 
with the conformity situation described by 
Asch (1). The subject was led into a small 
room with four other men. Unknown to the 
experimental subject, these other four men 
were paid “stooges” cooperating with the ex- 
perimenter. When the five men were seated, 
the experimenter gave the following instruc- 
tions: 


This is a task which involves the discrimination of 
lengths of lines. You see the pair of white cards in 
front. The card on the left shows a single line; the 
card on the right has three lines differing in length. 
They are numbered 1, 2, and 3 in order. One of the 
three lines on the card on the right is equal in 
length to the standard line on the card on the left. 
You will decide in each case which is the equal line. 
You will state your judgment in terms of the corre- 
sponding number, either one, two or three. There 
will be 12 such comparisons. Since the number of 
lines is few and the group small, I shall call upon 
each of you in turn to announce your judgment, 
which I shall record here on a prepared form. Please 
be as accurate as possible. Suppose we start at the 
right and proceed to the left. 


The naive subject was always seated fourth 
from the right and was always the fourth per- 
son to call out his judgment. When all five 
subjects had stated aloud their judgments on 


one pair of cards, these two cards were re- 
moved and replaced by a new pair with new 
standard and comparison lines. This was re- 
peated 12 times with 12 different pairs of 
cards. The measurements of the standard and 
comparison lines were identical with those 
used by Asch (1). 

The four men cooperating with the experi- 
menter had been previously instructed to an- 
nounce correct answers on five trials (1, 2, 
5, 8, and 11) and to announce incorrect an- 
swers on the other seven trials (3, 4, 6, 7, 9, 
10, and 12). On each of these seven crucial 
trials, all four announced the same incorrect 
answer and constituted a majority opinion. 

The TAT stories were analyzed for two 
types of themes suggesting dependency needs. 
Themes in which the hero sought help from 
another individual in solving a personal prob- 
lem or was disturbed over the loss of a source 
of love and support were classified as D. 

It should be noted that D themes are simi- 
lar to those which Murray considers indica- 
tive of m Succorance. Only one D theme was 
scored for each story. 

The second category, d themes, were scored 
when the hero was given some help or gift 
(advice, food, money, etc.) which was not 
specifically requested. This type of story, 
closely related to Murray’s category of p 
nurturance, is probably a more indirect meas- 
ure of dependency. However, some clinicians 
might argue that stories in which the hero re- 
ceives unsolicited nurturance reflect the story 
teller’s dependency needs. 

All stories were independently scored by the 
authors without knowledge of the subject’s 
group conformity behavior. Percentages of 
agreement for D and d themes were 93 and 
95 per cent, respectively, indicating that the 
scoring system was highly reliable. 


Results 


Sixteen of the 27 subjects conformed to the 
incorrect group judgment on one or more of 
the seven crucial trials. The number of 
“yields” per subject (conformity with the in- 
correct judgment) ranged from zero to seven 
with a median of one for the entire group. 
There was no significant difference between 
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Table 1 


Distribution of “Yielders” and “‘Nonyielders’” Among 
Subjects High and Low in D Themes 














D Themes 
Group High Low 
Yielders 10 6 
Nonyielders 0 11 





x? = 8.52, p <.01. 


Table 2 


Distribution of “Yielders’” and “Nonyielders’” Among 
Subjects High and Low in d Themes 














d Themes 
Group High Low 
Yielders 9 7 
Nonyielders + 7 





x? = .39, p >.50. 


the mean ages of those who yielded (22.3) 
and those who did not (20.6). 

For this group of subjects production of D 
themes was relatively infrequent, the range 
being zero to three. Fifteen of the 27 subjects 
wrote at least one D theme and the median 
value for the group was one. Twenty-one of 
the 27 subjects produced one or more d 
themes with a range of one to five and a 
median of one for the entire group. 

In order to test the relationship between 
dependency themes and conformity with group 
opinion, the 27 subjects were divided into 
groups of “yielders” (those who conformed at 
least once) and “nonyielders.” The distribu- 
tions of D and d scores were dichotomized 
into high (above median) and low (median 
and below) groups. 

Tables 1 and 2 show the frequency of the 
number of yielders and nonyielders among 
those with high and low D and d scores, 
respectively. The values obtained when chi- 
squ°re tests, using Yates’s correction, are ap- 
pli-d to these data are also given. 


Discussion 
The results indicate that conformity to 


group opinion can be predicted from certain 
kinds of dependency themes on the TAT. 
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Table 1 shows that every one of the 10 sub- 
jects who produced two or more D themes 
yielded to the incorrect majority opinion. 

On the other hand, the more indirect meas- 
ure of dependency, d themes, was not sensi- 
tive in predicting group conformity. Appar- 
ently individuals who see their heroes as ac- 
tively seeking aid are more likely to behave 
dependently (as defined here) than those in- 
dividuals who portray their heroes receiving 
unsolicited nurturance. 

In addition to the need for support in mak- 
ing decisions, fear of loneliness and isolation 
might characterize the dependent individual. 
Thus, fear of alienation by the group for dis- 
agreeing with the majority judgment may 
have motivated conformity with the incorrect 
group opinion. Minimal support for this no- 
tion is found in the TAT themes in response 
to the picture of the boy sitting alone on the 
door step of a log cabin (Card 13B). Al- 
though themes of loneliness were infrequent 
for these subjects, four of the five individuals 
who described the boy as lonely, yielded to 
the majority opinion. 

The unique qualities of the conformity 
situation suggest the need for caution in gen- 
eralizing the implications of these findings. 
Although the subjects who wrote several D 
themes conformed in a pressure situation with 
same sex peers, these men might react differ- 
ently with individuals of different sex or 
status, e.g., younger women, authority fig- 
ures, lower class individuals, etc. 

Moreover, the pressures to conform in this 
situation were apparently very strong, for all 
subjects, yielders and nonyielders, appeared 
uncomfortable when their judgments dis- 
agreed with the incorrect majority. Those 
who yielded in this laboratory situation, 
therefore, might not conform in social con- 
texts characterized by considerably less group 
pressure. 


Summary 


The purpose of this study was to relate de- 
pendency themes on the TAT to the tendency 
to conform to group opinion. 

Twenty-seven male undergraduates wrote 
stories to eight TAT cards and then were in- 
dividually observed in the Asch conformity 
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situation. In this situation, the subject and 
four other male students were requested to 
discriminate lengths of lines and orally an- 
nounce their judgments. Unknown to the ex- 
perimental subject, the other four men were 
cooperating with the experimenter and on 
prearranged discrimination trials, announced 
incorrect answers in an attempt to pressure 
the naive subject to conform with the incor- 
rect majority judgment. 

The subjects who produced TAT themes in 
which the hero sought help in a problem 
situation or was portrayed as disturbed over 
loss of sources of love and support yielded to 
the incorrect majority more frequently than 
those subjects not writing these types of 
stories (p < .01). 


Received June 6, 1955. 
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Cross Validation of Objective TAT Scoring’ 


Richard H. Dana 
St. Louis State Hospital 


Objective TAT scoring categories have only 
recently been introduced (2, 6). Three of five 
categories originally proposed as significant 
for diagnostic differentiation successfully dis- 
tinguished between normal and abnormal male 
groups (2). As prediction measures two of 
these categories, Perceptual Organization and 
Perceptual Range, identified 76 and 70 per 
cent of normals, 60 and 64 per cent of neu- 
rotics, and 78 and 84 per cent of psychotics, 
respectively in the experimental groups. Per- 
ceptual Personalization prediction scores iso- 
lated 90 per cent of the normal group. These 
figures are the result of use of normal-neu- 
rotic and neurotic-psychotic medians as cri- 
teria. 

The application of these three scoring 
categories to other populations, i.e., cross 
validation, and the accessibility of results on 
increasingly larger groups is a necessary fur- 
ther development. 


Scoring System 


Three scoring categories, Perceptual Or- 
ganization (PO), Perceptual Range (PR), 
Perceptual Personalization (PP), were se- 
lected from the five original scoring cate- 
gories on the basis of their prediction scores. 

Perceptual Organization. The category re- 
flects the S’s ability to follow the standard 
directions to “tell a story.” Components in- 
cluded are card description, present behavior, 
past events, future events, feeling, thought, 
and outcome. One point is scored for each 
component used in the story. 

Perceptual Range. Three separate stimulus 
properties were selected for each card on the 


1 This study was carried out in connection with an 
investigation supported by a research grant from 
the National Institute of Mental Health, of the Na- 
tional Institutes of Health, Public Health Service. 


basis of utilization by at least 90 per cent of 
a “normal” group (7). PR is designed to sam- 
ple the extent to which the usual or “popu- 
lar” appears in the story. A list of the 15 
stimulus properties follows: Card II (a) 
family: young girl; woman, activity specified; 
adult male; (6) fields or farm; (c) books or 
school; Card III (d) figure, sex and age 
specified; (e€) emotions noted; (f) activity 
specified; Card IV (g) man, emotions noted, 
activity specified; (4) woman, activity speci- 
fied; (i) conflict or cooperation; Card VI 
(j) man, emotions noted, activity specified; 
(k) woman, emotions noted; (/) personality 
referent; Card VII (m) older male, activity 
specified, relationship specified; (m) male, 
emotions noted; (0) personality referent. 
One point is scored for each of these men- 
tioned in the TAT story. All items included 
in each point must be mentioned for score to 
be earned. 

Perceptual Personalization. Some expres- 
sions, words, and phrases used in the story 
are incongruous and have no obvious refer- 
ence to the story that S is trying to relate. 
These inclusions are clearly neither stimulus 
reproductions nor additions to the stimulus. 
PP are deviations from the relatively con- 
sistent, organized, coherent protocol-product, 
the TAT story. These deviations, in order to 
be scored, must be extreme. They may refer 
to things labeled performance adequacy, com- 
ments, parenthetical remarks, qualifications, 
picture criticisms, adventitious descriptions, 
vagueness, evasion, or direct personal refer- 
ence. One point is scored for each personal- 
ized inclusion. 


Problem 


Cross validation is an attempt to ascertain 
how well these selected TAT scores will func- 
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tion when applied to different samples of Ss. 
If these TAT scores are to be usable on groups 
other than those on which they were devel- 
oped, they should be significantly related to 
clinical diagnosis on cross-validation samples. 
Furthermore, all three scoring categories 
should significantly differentiate between nor- 
mal and clinical groups. 

Subjects. The TAT stories resultant from 
administration of cards 2, 3BM, 4, 6BM, 
7BM (5) were collected from three groups 
of male Ss: normals, neurotics, psychotics, 
30 in each group. The Ss were chosen in 
terms of the same criteria as in the original 
study with one difference: the neurotic Ss 
were not hospitalized but were outpatient 
clinic patients. 

Procedure. The 450 stories obtained from 
the three groups were scored for each of the 
three categories. The five stories from each 
S were treated as a unit. These units were 
coded, randomized, and scoring was done fol- 
lowing mimeographed directions on a spe- 
cially prepared score sheet by the author 
and a clerk.’ 


Results 


Reliability. Two kinds of scorer reliability 
were computed by means of percentage of 
agreement: (a) scoring category reliability; 
(6) reliability of the items composing the 
categories. The appropriateness of percentage 
of agreement for estimating projective test 
data reliability has been discussed previously 
by the author (3, 4) and others (8). 

Scoring category reliability was determined 
from the results of two scorers on 75 stories, 
25 from each group, chosen at random, and 
scored in random order to avoid knowledge 
of the group to which any particular record 
belonged. The reliabilities of PO, PR, and PP 
were 93, 90, and 75 per cent of agreement, 
respectively. The over-all reliability of these 


2 Mimeographed copies of the TAT scoring direc- 
tions and the score sheet may be obtained from the 
author upon request, or from the American Docu- 
mentation Institute. To obtain them from the latter 
source, order Document No. 4743 from ADI Aux- 
iliary Publications Project, Photoduplication Service, 
Library of Congress, Washington 25, D. C., remit- 
ting in advance $1.25 for microfilm or $1.25 for 
photocopies. Make checks payable to Chief, Photo- 
duplication Service, Library of Congress. 


three scoring categories was 89 per cent. 
These figures are somewhat higher than those 
reported in the original study. These differ- 
ences can be attributed to: (a) practice in 
scoring obtained by both scorers; (6) the 
formalization of scoring criteria on a com- 
pact, easily used score sheet. 

The reliability of scoring separate PO and 
PR items ranged from 80 to 100 per cent of 
agreement. The percentages of agreement for 
PP were 81, 73, 74, for normals, neurotics, 
and psychotics, respectively. 

Validity. Validity, with diagnosis as the ex- 
ternal criterion, was obtained by use of a non- 
parametric median test. Table 1 presents the 
median and range for each of the three 
groups on each scoring category. The scores 
in each category were placed in rank order 
and combined medians obtained for the 
groups being compared: normals-neurotics, 
normals-psychotics, neurotics-psychotics. The 
number of cases falling above and below these 
combined medians was determined by inter- 
polation. Chi square was then used to deter- 
mine differences between groups (Table 2). 

The median test results indicate: (a) PO 
and PR significantly differentiate between nor- 
mal, neurotic, and psychotic groups. (6) PP 
significantly differentiates between normal and 
clinical groups, but does not distinguish be- 
tween clinical groups. Thus, cross validation 
of PP does not substantiate the significant 


Table 1 


Median and Range for Normal, Neurotic, and Psychotic 
Ss on Perceptual Organization (PO), Percep- 
tual Range (PR), and Perceptual 
Personalization (PP) 











Category Median Range 
PO 

Normal 26.50 22-31 

Neurotic 18.50 11-24 

Psychotic 10.78 6-21 
PR 

Normal 13.38 11-15 

Neurotic 8.78 5-11 

Psychotic 5.17 1-9 
PP 

Normal .29 0-6 

Neurotic 3.83 0-29 

Psychotic 5.00 0-40 
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Table 2 


Summary of Median Test Results on PO, PR, PP 
Scores of Three Groups 








Combined Above 





Category median median x? p 
PO 
Normal 23.0 79 593 <.001 
Neurotic 1.0 
Normal » 30.0 s 
: 21.50 58.0 j 
Psychotic _ s . — 
Neurotic 28.0 
; Of 5. . 
Psychotic aad 2.0 a <= 
PR 
Neral 1.07 78 seo  <.001 
Neurotic 3 
Normal ac 30.0 
5 56. <.00 
Psychotic _ 1.0 sai . 
Neurotic " 29.0 4 
- .00 9. 00 
Psychotic 48 ad <a 
PP 
Normal ac 8.2 
Neurotic —_ 27.1 a6 <a 
Normal » 6.2 
2. 30. £ 
Psychotic 7 a 
Neurotic 15.0 
33 ’ — 
Psychotic = 18.0 . 





differentiation obtained between neurotic and 
psychotic groups in the original study. 

An approximate prediction measure was 
employed to determine the extent to which 
individuals are identified correctly by each of 
these scoring categories. The two medians re- 
ported in the original study for each scoring 
category were used as criteria: normal-neu- 
rotic and neurotic-psychotic. The median 
scores used for PO were 24.50 and 16.94; 
PR scores were 11.19 and 7.68; and PP 
scores were 1.90 and 7.37. Scores below the 
normal-neurotic median were given 0; scores 
falling between the normal-neurotic median 
and the neurotic-psychotic median were given 
1; and scores above the neurotic-psychotic 
median were given 2. 

Table 3 shows the results for each scoring 
category. These figures may be combined in 
various ways to provide the discrimination of 
each scoring category with each cutoff point, 
0, 1, 2. Maximum discrimination is found by 
selecting a cutoff score which exaggerates 
the differences between groups. Thus, PR re- 
mains the “best,” i.e., maximally discriminat- 
ing category on which 98 per cent of normal 
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Ss had 0, 81 per cent of neurotic Ss had 1, 
and 84 per cent of psychotic Ss had 2 scores. 

Comparison of these prediction scores with 
those obtained in the original study (2) indi- 
cates that both PO and PR retain their ef- 
fectiveness on new populations. In fact, the 
prediction level for the cross-validation groups 
is somewhat higher than in the original 
groups. The level of prediction for PP is sub- 
stantially equivalent to that originally ob- 
tained. PP remains effective for distinguish- 
ing normal from pathological groups but not 
for more precise differentiation of degree of 
illness. 

Cutoff scores were also determined in the 
same manner from the data of this cross- 
validation sample using the medians reported 
in Table 2. Prediction levels were enhanced 
by this method. For scoring category PO, 97 
per cent of normal Ss had 0, 90 per cent of 
neurotic Ss had 1, and 93 per cent of psy- 
chotic Ss had 2 scores. For PR, 100 per cent 
of normal Ss had 0, 96 per cent of neurotic 
Ss had 1, and 93 per cent of psychotic Ss 
had 2 scores. PP prediction scores were not 


appreciably different from those obtained 
using cutoff scores derived from the original 
study. 


It is suggested that cross-validation differ- 
ences may result from the use of an outpa- 
tient neurotic group which exaggerated the 
differences between groups by providing Ss 
more nearly equidistant on a hypothetical 


Table 3 


Percentage of Ss in Each Group Receiving Scores of 
0, 1, 2 on Each Category 


Category 


Group PO PR PP 
Normal 
0 80 98 72 
1 20 2 28 
2 0 0 0 
Neurotic 
0 1 2 10 
1 66 81 64 
2 33 17 26 
Psychotic 
0 0 0 6 
1 6 16 57 
2 94 84 37 
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continuum of “illness” between the normal 
and psychotic groups. This suggestion is given 
credence by the almost complete absence of 
overlap between the normal and psychotic 
groups in the original study. To a consider- 
ably lesser degree, the additional training and 
experience of scorers reflected in the higher 
reliability figures may have been operative in 
sharpening differences between groups. The 
fact remains that there was minimal overlap 
between the three groups for PO and PR 
scores. 


Discussion 


The use of objective TAT scores for male 
groups seems to have considerable diagnostic 
power. This success may be attributable, at 
least in part, to test scoring based on a 
rationale for simple, consistent, theoretically 
derived objective scores for all projective 
tests (3). This rationale is based on three 
aspects of test behavior deemed sufficient for 
development of objective scoring systems: (a) 
approach to the situation (reflected in the 
manner standard test directions are followed) ; 
(6) normality of response (abstractions of 
structural and content material included by 
specified percentages of “normal” Ss); (c) 
rarity of response (those infrequent responses 
in a “normal” population which appear with 
significantly greater frequency in psycho- 


pathological conditions). The direct relation- 
ship between these three aspects of test be- 
havior and the TAT scoring categories PO, 
PR and PP is clear. Certain controversial 
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theoretical assumptions concerning person- 
ality, ie., Personality Orientation (1), ante- 
date the rationale from which these scores 
were developed. However, the use of these 
categories demands no particular theoretical 
approach to personality. Similarly, the de- 
velopment of objective TAT scores does not 
imply abandonment of content analysis but 
is merely a formal aid to this process. 

However, objective TAT scoring to attain 
more than mere passive clinical recognition 
must yield descriptive personality data. Fur- 
ther research must concentrate upon this as- 
pect of validity. 


Received June 1, 1955. 
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Role Taking in Schizophrenia 


Isidore Helfand * 


Teachers College, Columbia University ? 


Experimental investigations of role taking 
or the empathic process have been primarily 
conducted with “normal” Ss, especially col- 
lege students. The results of these investiga- 
tions, notably the studies of Dymond (3, 4, 
5) and McClelland (8), suggest that the 
greater the degree of emotional instability, 
the poorer the ability of the Ss to take the 
role of another. These findings are consistent 
with the hypotheses offered by Cameron (1, 
2) who suggested that a disturbance in role- 
taking skills is crucial to the development of 
schizophrenia. 

The present experiment was designed to 
explore role-taking characteristics in schizo- 
phrenia. In addition, it was hoped that some 
light would be cast upon the characteristics 
of empathic ability, not only in schizophrenia, 
but in the normal individual. Specifically, two 
major hypotheses were tested: 

1. Schizophrenics, when compared with non- 
psychotic individuals, show poorer role-taking 
skills. 

2. Nonpsychotic individuals, when com- 
pared to schizophrenics, are more homoge- 
neous in their agreement with one another as 
to the characteristics of the individual whose 
roles they are instructed to take. 


Procedure 


Studies in role taking have used what might 
be termed the method of personal acquaint- 


1 The present paper is a condensation of a part of 
a doctoral dissertation completed at Teachers Col- 
lege, Columbia University. The author wishes to ex- 
press his indebtedness to the chairman of his dis- 
sertation committee, Professor E. J. Shoben, Jr., and 
to Professors L. F. Shaffer and Herbert Sclomon for 
their guidance and assistance. 

2Now at Park Lane Neuropsychiatric Clinic, 
Cleveland. 


anceship. Students sharing a common course 
in college are asked to rate one another. In 
a study which seeks to compare schizophrenics 
and normal individuals, such a procedure be- 
comes impractical. It is especially so when it 
is desired that the degree of familiarity as 
well as the type of relationship be controlled. 

In the present study, therefore, an auto- 
biography obtained from a former hospital 
patient was used as a common stimulus. The 
patient had been hospitalized briefly for one 
of a series of minor recurring depressions. He 
was frank, honest, and, in the opinion of the 
hospital staff, someone with whom one could 
readily empathize. He was asked to discuss 
his life with reference to parents, siblings, 
courtship, marriage, sex, education, and vo- 
cation. From this material, an 80 item Q sort 
was constructed. Items were selected which 
minimized reactions on the basis of fact; 
rather, they emphasized the need for infer- 
ence on the part of the S. This procedure also 
reduced the likelihood of the sort’s being a 
mere reading task. 

The Q sort was then administered to the 
author of the autobiography with instructions 
to distribute the items so as to reflect most 
accurately his feelings and attitudes. The re- 
sult is referred to hereafter as the Criterion 
sort. 

Each of the Ss was asked to take the Q 
sort twice: First, they sorted the items to 
reflect their own attitudes and feelings. The 
second time, they were read the autobiog- 
raphy. They also had a copy which they 
could read and refer to. They were then asked 
to take the Q sort again, but this time as it 
reflected the ideas of the author of the auto- 
biography. They were to act like him when 
they took the test. The present paper is con- 
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Table 1 


Summary of Background Data 


























C’s P’s T.B Normal F Ratiot 
Age in years 35.9 38.2 37.9 33.4 99 
School grade completed 10.6 11.5 10.1 12.2 3.61* 
Reading grade level 8+ 8+ 8+ 8+ 
Reading score 14.9 17.4 18.5 19.6 17.14** 
Vocabulary raw score 23.2 26.5 23.6 25.1 1.3 
Hospitalization months 43.9 37.2 37.8 1.06 

t The F ratios are the results of the analyses of variance on each of the five factors indicated. 


* Significant at the .05 level of confidence (Fa = 2.76). 
** Significant at the .01 level of confidence (Fu = 4.13). 


cerned with this second or Simulated sort and 
its relationship to the Criterion. 


Subjects 


All 64 Ss were given screening tests consist- 
ing of the Wechsler-Bellevue Vocabulary Test 
(12) as an estimate of verbal abilities and a 
reading test (10). The aim here was to elimi- 
nate illiterates, functional mental defectives, 
and those too uncooperative or inattentive to 
manage the tasks adequately. 

Of the 200 male schizophrenic patients at a 
chronic treatment center between the ages of 
20-45 who had been hospitalized for two to 
five years, 25 remained after screening. These 
25 patients fell into two groups, a chronic 
group of 15 patients, and a privileged group 
of 10 patients. Chronic patients are confined 
to the wards and have a poor prognosis. Their 
activities are limited to simple routine ward 
chores. Privileged patients have made a suffi- 
cient recovery to be permitted freedom of the 
grounds. They worked in the hospital shops 
and some were being considered for discharge. 

A group of 19 tuberculous patients were in- 
cluded as a control for any possible deteriora- 
tion due to hospitalization. They were non- 
psychotic and had never required consulta- 
tion for a mental disorder. They had been 
hospitalized for a period ranging from twenty 
months to five years. 

The normal group included 20 individuals 
who were functioning members of their com- 
munity. None had been hospitalized for a 
mental disorder or had ever sought profes- 
sional help for such a disturbance. They were 
firemen, machine operators, clerks and the 
like. All were gainfully employed. 


Table 1 provides a summary of the back- 
ground data of the subjects employed. An 
analysis of variance indicated that the groups 
differed significantly on two factors: Educa- 
tion (at the .05 level of confidence) and read- 
ing ability (at the .01 level of confidence). 
Chronic schizophrenics and tuberculous pa- 
tients appear to be less well educated than 
the other two groups. While all groups were 
able to surpass requirements for an eighth 
grade reading level, chronic schizophrenics 
show a marked impairment in these skills, 
possibly as a function of a concentrative 
difficulty. Age, length of hospitalization, and 
verbal abilities do not differ significantly 
among the groups. 

The influence of education and reading 
ability was considered in the analysis of the 
results. 


Results 


Empathic ability was determined by com- 
paring the Simulated sort with the Criterion 
sort. The individual correlations were con- 
verted to z scores since correlations are not 
normally distributed. The resulting array was 
subjected to an analysis of variance. The F 
ratio obtained (F = 11.05) indicates that the 
between group variance was greater than the 
within group variance to a degree in excess of 
that which might be anticipated by chance 
(Fo, = 4.13). The analysis was continued us- 
ing a ¢ test, the results of which can be seen 
in Table 2 

As can be seen, all groups differ signifi- 
cantly from the chronic schizophrenic group 
in their ability to empathize. The privileged 
schizophrenic group is superior to all others. 
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Table 2 


Mean Empathy Scores of All Groups and the 
Significance of the Differences 
Between Means 











The entire group was split in two, a high and 
low empathic group. A comparison was then 
made between poorer readers and more pro- 
ficient readers. A similar analysis was made 
comparing education. A critical ratio of .49 


C’s P’s T.B. Normal § for reading ability and .36 for education sug- 
Mean 050 437 356 290 gested that these factors had little bearing on 
SD .236 165 149 153 empathic ability. 
ah 4.61°° ol oo. The second hypothesis was tested by cor- 
seal ¢ 13 relating the Simulated sort of each individual 





* Significance at or better than the .05 level of confidence. 
** Significance at or better than the .01 level of confidence. 


This superiority is particularly striking when 
the privileged patients are compared with nor- 
mal individuals. Tuberculous patients, while 


significantly different from the chronic schizo- 


phrenics, stand between the privileged and 
normal groups and are significantly different 
from neither. 

To determine whether these results were in- 
fluenced either by reading ability or educa- 
tion, an analysis of these two factors was 
carried out using the Mann-Whitney U test. 


with the sort of every other individual in his 
group. The distribution of correlations and 
their medians can be seen in Table 3. 

From this array, it would appear, inspec- 
tionally, that the differences between the 
chronic schizophrenics and other three cate- 
gories are particularly pronounced. Normals, 
it would appear, are superior to all groups, 
and, as was the case with role-taking meas- 
ures, privileged schizophrenics most closely 
resemble tuberculous patients. The tubercu- 
lous appear to have greater homogeneity than 
the psychotics. Consequently, there seems to 
be an increasing degree of homogeneity in 


Table 3 


Distribution of Intragroup Scores 
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C’s P’s T.B. Normal 

Scores N q N N J, N q 

— 30 1 1 

—.25 3 4 

—.20 8 11 

—.15 6 17 1 2 

—.10 4 20 

—.05 10 30 1 6 1 5 

0 34 63* 2 2 

05 19 80 3 9 3 4 5 3 
.10 8 88 3 16 9 8 6 6 
15 5 93 8 33 7 19 10 12 
.20 4 97 4 42 17 29 15 19 
25 2 7 58* 29 45 14 27 
30 1 3 64 13 53* 9 31 
35 2 69 10 59 18 41 
40 7 85 16 68 30 57* 
AS 3 91 14 77 38 77 
50 1 93 15 86 19 87 
55 1 96 11 92 8 91 
60 2 6 95 12 97 
65 2 96 3 99 
.70 4 99 1 
75 1 1 
80 1 





* Median rank for each group has been italicized. 
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going from the chronic schizophrenics through 
the privileged and tuberculous Ss to the 
normals. 

The fact that each individual was corre- 
lated with everyone else within his group 
makes for a lack of independence in observa- 
tion that precludes the application of a sta- 
tistical analysis or any conclusive tests of 
homogeneity among the groups. While, there- 
fore, it appears that normal individuals are 
more homogeneous in vheir agreements and 
that schizophrenics agree less among them- 
selves concerning the individual whose roles 
they take, this conclusion must be taken only 
as suggestive in view of the inapplicability 
of rigorous tests of significance. 

Discussion 

The findings tend to confirm certain aspects 
of the hypotheses. Others remain moot and 
unsupported. The normal individual’s reac- 
tion appears to be more consistent with the 
hypothesis offered by Lindgren and Robinson 
(7). Rather than responding with any marked 
degree of sensitivity, normals appear to re- 
spond more in terms of a preconceived and 
more universally shared idea of what another 
should be like. It seems likely that “well-ad- 
justed” individuals of Dymond’s studies had 
a good appreciation of cultural norms and ex- 
pectancies from which their judgments were 
derived. In any event, this factor of conven- 
tionality appears to be more important in the 
normal individual’s reaction to another than 
does the other’s idiosyncrasies. 

The chronic schizophrenic is apparently de- 
ficient in both areas. He is indeed the indi- 
vidual, responding on the basis of his own 
fantasies and ideas, with little concern for 
either the accuracy of what he perceives or 
whether his ideas are shared with others. One 
limitation of this study is the lack of cer- 
tainty as to whether the chronic schizophrenic 
patient responded because of a basic impair- 
ment of role-taking skills or because of dis- 
interest in the task. It was hoped that this 
latter was minimized by screening. Following 
an analysis of the results, the question ap- 
peared to remain unanswered. Because of 
practical considerations due to the lapse of 
time, the reliability of the sorts could not be 
obtained. 


Some suggestion that role taking may not 
be deficient in the chronic schizophrenic, but 
merely not utilized, is offered by the results 
of the privileged patients. Assuming that the 
latter were once as severely disturbed (re- 
quiring custodial treatment) as were the 
chronic schizophrenics, partial remission from 
the disease appears to be characterized by a 
hypersensitivity to the feelings of others, a 
phenomenon Fromm-Reichman aptly titled 
“emotional eavesdropping” (6). In contrast 
to normal individuals, however, they appear 
to lack a conventional frame of reference. 
Despite their relative accuracy in role tak- 
ing, each responded to the cues given in a 
very individual manner. 

The results pose at least two problems: 
What is it that permits, or enables the schizo- 
phrenic to respond with such sensitivity? 
Secondly, why do they respond with such 
sensitivity? 

‘In reviewing the method, the task appears 
to be not unlike a projective or semistruc- 
tured situation. An opportunity was provided 
for interpretation rather than a repetition of 
facts, as so much of the behavior of the Ss 
might be characterized as projective in the 
psychometric sense of the term. The behavior 
of the normal individual was to project a 
preconceived set of ideas based primarily on 
a commonly shared cultural stereotype. The 
privileged patients, responding to the same 
information, made better use of it, although 
their reactions were highly idiosyncratic. Sar- 
bin’s description of the schizophrenic as 
lacking a “generalized other” concept seems 
appropriate to the results here (11). The 
schizophrenic, possibly because of this lack, 
responded to the cues as he perceived them. 

The normal individual, on the other hand, 
may have received the same cues, tested or 
evaluated them, and, in the absence of cor- 
roborating data, rejected many of the hy- 
potheses. The schizophrenic appears to be 
much less critical. In all probabilities, both 
this lack of criticalness and a lack of a con- 
ventional frame of reference, or what Mead 
has termed a “generalized other” (9), serve 
to contribute to the hyperacuity noted. 

Assuming that schizophrenics lack this 
generalized other, such an impairment, with 
resultant hyperacuity, should be found in 























— 





we FF 





younger children, and in emotionally dis- 
turbed children as compared to well-adjusted 
children of the same age. Since such concepts 
as a generalized other are predicated on the 
transition from concern with particular peo- 
ple, via interpersonal relationships, disturb- 
ances in such relationships, as is indicated in 
schizophrenia, would contribute to a failure 
to make the transition. If inconsistency, re- 
jection, and hostility characterize the history 
of the individual with a predisposition to the 
development of schizophrenia, he would look 
on each individual not unlike a sentry on a 
hostile frontier, carefully evaluating and ap- 
praising each one, and never, as do normal 
individuals, taking others for granted, and 
relying on generalized norms of what others 
are like. 

Concerning the tuberculous group, the find- 
ings offer only equivocal suggestions. In terms 
of the results, these Ss appear to most 
closely resemble the privileged schizophrenic. 
Whether this is a function of the debilitating 
social consequences of long hospitalization 
and the nature of the routine, or of some 
“tuberculous personality,” awaits further re- 
search. It is interesting to note that one 
would expect a higher degree of homogeneity 
than appeared to characterize these individu- 
als, since they are sequestered together for a 
long period of time, and have an excellent op- 
portunity to know one another. 


Summary and Conclusions 


Schizophrenics, tuberculous, nonpsychotic 
patients, and normal individuals were tested 
on their ability to take roles. Empathy was 
determined by their ability to simulate the 
test performance of an author of an auto- 
biography which was provided them. In ad- 
dition, the relative agreement with one an- 
other, within each group, was determined. The 
results suggest the following: 

1. Chronic schizophrenics have impaired 
role-taking skills and are relatively individual 
in their perception. 

2. Normal individuals, while superior to 
chronic schizophrenics in role-taking skills, 
are more inclined to rely on a conventional 
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frame of reference, rather than demonstrating 
marked role-taking skills. 

3. Privileged patients show a marked sensi- 
tivity to others, but appear to respond in a 
highly idiosyncratic manner. 

4. A suggestion was offered that schizo- 
phrenics appear to lack a concept of a “gen- 
eralized other,” and that they never make 
the transition in role-taking skills, from em- 
pathy with particular people to a more con- 
ventional—universally shared—frame of ref- 
erence. 

5. Tuberculous patients appear to be more 
like the privileged or partially remitted schizo- 
phrenics. The findings here, however, can only 
be suggestive in the absence of statistical sig- 
nificance. 

6. The autobiography appears to be a use- 
ful tool in the measurement of empathic 
ability. 


Received June 6, 1955. 
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Some Psychological Correlates of Humor Preferences’ 


Rudolf Grziwok 


The University of California 


and Alvin Scodel 


The Ohio State University 


One hundred and forty male college stu- 
dents were given the following sequence of 
procedures: (a) A series of 40 cartoons on 
which ratings of funniness were made; (0) 
The TAT (Cards 4, 6BM, 7BM, 13MF, 14, 
18BM, 16); (c) The Allport-Vernon-Lindzey 
Study of Values. 

The 40 cartoons were selected from a larger 
group of 250, all of which had originally ap- 
peared in The New Yorker magazine. Three 
judges placed each cartoon into one of five 
categories defined as follows: 1. Humorous ef- 
fect based on aggression, either explicit or 
deliberately understated; 2. Humorous effect 
obtained by a parody on sex; 3. Humor based 
on the exaggeration or paradoxical use of so- 
cial stereotypes; 4. Humorous effect based on 
obvious and striking logical incongruity; 5. 
No category was applicable or two or more 
categories were equally applicable. 

The first two categories (aggression and 
sex) can be subsumed under “orectic humor”’ 
and three and four (social commentary humor 
and humor based on logical incongruity) un- 
der “cognitive humor.” 

Seventy-three cartoons, excluding agree- 
ments on category 5, were placed in the same 
category by all three judges, a degree of 
agreement exceeding chance at the .001 level. 
The final series of 40 cartoons contained 10 


1An extended report of this study may be ob- 
tained without charge from Alvin Scodel, Psychology 
Department, The Ohio State University, Columbus 
10, Ohio, or for a fee from the American Documenta- 
tion Institute. Order Document No. 4742 from ADI 
Auxiliary Publications Project, Photoduplication Serv- 
ice, Library of Congress, Washington 25, D. C., re- 
mitting in advance $1.75 for microfilm or $2.50 for 
photocopies. Make checks payable to Chief, Photo- 
duplication Service, Library of Congress. 


cartoons from each of the first four cate- 
gories. No cartoonist was represented in any 
category more than twice, and duplication of 
themes was avoided as much as possible. 

The TAT stories were scored according to 
specially constructed scales for degrees of. ag- 
gressive and sexual content and, in addi- 
tion, intraception vs. extraception. Independ- 
ent scoring by the writers of a portion of the 
stories resulted in agreements of 90%, 92%, 
and 88% respectively; in view of this high 
agreement, remaining scores were scored by 
the senior writer. 

In the evaluation of results the ratings of 
funniness were converted to standard scores 
and then ranked. The TAT and Allport- 
Vernon-Lindzey scores were dichotomized and 
then compared on the basis of humor scores 
by the use of the Mann-Whitney test. 

The significant results indicated that sub- 
jects high in TAT aggression prefer aggres- 
sive humor while those low in TAT aggres- 
sion prefer social commentary humor. With 
respect to value orientations subjects high on 
the aesthetic scale prefer logically incongruous 
cartoons whereas those who are low on the 
same scale prefer aggressive humor. More- 
over, subjects high on the social scale prefer 
aggressive cartoons and those low in the theo- 
retical value show a preference for sexual 
cartoons. In more general terms a preference 
for orectic humor, as opposed to cognitive 
humor, seems to be characterized by more 
fantasy aggression, more extraversion or out- 
goingness, less preoccupation with intellectual 
values, and less psychological subtlety or com- 
plexity. 

Brief Report. 
Received October 10, 1955. 
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A Dimensional Analysis of Depth of Interpretation’ 


H. L. Raush, Z. Sperber, D. Rigler, J. Williams, N. I. Harway,’ 
E. S. Bordin, A. T. Dittmann,* and W. L. Hays 


University of Michigan 


The research worker interested in problems 
of human interaction is faced with a dilemma. 
The concepts he is concerned with are com- 
plex ones. If he chooses in his research to 
simplify the variables he employs, so as to 
minimize problems involving issues of meas- 
urement, the assumptions he must make ob- 
scure the relationship of his variables to the 
concepts and theories he started from. If, on 
the other hand, he prefers to work with vari- 
ables directly relevant to his theoretical for- 
mulations, he must face a multitude of issues 
in the course of developing adequate methodo- 
logical tools (2). 

This dilemma has come to our attention 
particularly in our studies of depth of inter- 
pretation. This complex variable derives di- 
rectly from psychoanalytic theory (6) and 
has been a subject of disagreement among 
therapeutic theories (3, 5). As a necessary 
step toward testing the differing theoretical 
viewpoints about depth of interpretation, we 
have developed a measure for the concept, 
using conventional methods for constructing 
a rating scale (8). However, a rating scale 
such as we have used and such as is common 
in other clinical studies, forces judges to 
respond as though they were dealing with a 
single dimension, irrespective of the true 


1 This paper is part of an investigation supported 
by a research grant M-516 C-2 from the National 
Institute of Mental Health of the National Institutes 
of Health, United States Public Health Service. 
Principal Investigators are: E. S. Bordin, H. L. 
Raush; Coordinator for the Project: Z. Sperber. 
Part of this paper was presented before the Mid- 
western Psychological Association, Columbus, Ohio, 
May, 1954. 

2 Now at the University of Rochester. 

® Now at the National Institute of Mental Health, 
Bethesda, Maryland. 


complexity of the variable. Therefore, the 
dimensionality of depth of interpretation re- 
mains to be studied. 

We shall describe four studies designed to 
determine: (a) whether depth of interpreta- 
tion is uni- or multidimensional; (6) whether 
the dimension or dimensions revealed are con- 
sistent with our concept of depth of inter- 
pretation; (c) to what extent other dimen- 
sions can be identified. 

If we demonstrate that depth of interpreta- 
tion is in fact unidimensional, we can proceed 
to use the scale and to make more confident 
inferences about its relationship to other varia- 
bles. However, interjudge reliabilities ob- 
tained for rating of single therapist responses 
were low enough to raise questions about the 
likelihood of finding unidimensionality. If we 
uncover evidence of more than one dimension, 
but we judge that these dimensions can be 
incorporated into a revised notion about depth 
of interpretation, then our study will have 
brought about a modification in our concept 
or theory. Also, by being able to identify 
these dimensions we may be able to achieve 
better agreement among raters. Finally, if 
more than one dimension is revealed, any of 
the additional ones may prove irrelevant from 
our point of view and might be susceptible to 
elimination through refinement of the scale 
or through training of judges. 

In summary, our specific purpose in this 
paper is to find out what attributes judges use 
in dealing with depth of interpretation, to 
what extent they agree as to the attributes, 
and to what extent any of these attributes 
corresponds to our definition and scale of 
depth of interpretation. The approach we 
shall discuss is applicable to a wide variety 
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of clinical studies. It accepts the complexity 
of psychological phenomena; it places rela- 
tively little constraint on the data; and it 
allows us to estimate the number and to de- 
scribe some characteristics of dimensions that 
judges are actually using in evaluating com- 
plex attributes. 


General Method 


Definition. Depth of interpretation was de- 
fined for the judges as follows: “any behavior 
on the part of the therapist that is an expres- 
sion of his view of the patient’s emotions and 
motivations—either wholly or in part—is con- 
sidered an interpretation. A patient has vary- 
ing degrees of awareness of his emotions and 
motivations. Depth of interpretation is a de- 
scription of the relationship between the view 
expressed by the therapist and the patient’s 
awareness. The greater the disparity between 
the view expressed by the therapist and the 
patient’s own awareness of these emotions and 
motivations, the deeper the interpretation.” 
In most of the studies the judges were given, 
in addition to the definition, a seven point 
scale with nine descriptive statements illustra- 
tive of various points on the scale (8). 

Judges. The judges used for all studies were 
Ph.D.’s, staff psychologists, or advanced grad- 
uate students at the University of Michigan, 
all of whom had at least 100 hours of experi- 
ence as psychotherapists. 

Stimuli. All stimuli consisted of excerpts 
from transcripts of recorded therapeutic ses- 
sions. Judges were presented with three con- 
secutive patient-therapist exchanges, ending 
with a therapist’s remark. Only the final thera- 
pist response was to be judged; the preceding 
section served to provide context. Selection of 
the stimuli will be discussed with reference to 
each study. 

Technique of data collection and analysis. 
Since the methods used in this study are of 
recent origin, it seems appropriate to provide 
a brief summary of them. Bennett (1) has 
developed techniques which enable us to 
estimate the number of dimensions in data 
that are collected by methods developed by 
Coombs (4). Hays (7) has expanded the 
procedure so as to provide information on the 
ordering of the stimuli on the several dimen- 
sions. These methods enable us to investigate 


the underlying factors entering into judgments 
of behavior by providing a set of dimensions 
which best account for a given set of judg- 
ments. 

All possible combinations of the stimuli, 
taken three at a time, were presented. The 
task for the judge was to say which two of 
the three stimuli were most alike and which 
two were least alike with respect to depth of 
interpretation (Method of Similarities). In 
the first two studies where six stimuli were 
used, there were 20 such triads; in the third 
and fourth studies using seven stimuli, there 
were 35 triads. These triads were presented 
in a randomized order. Using each stimulus of 
the triad in turn as the reference point, we 
can then rank the other two with respect to 
their psychological distance from this ref- 
erent.* 

All possible paired comparisons are derived 
in association with each reference point. If 
the judgments are transitive—that is, if the 
judge behaves consistently in going from triad 
to triad—the partial rank orders derived from 
the separate triads can be combined to form 
a complete rank order, or I Scale, with the 
reference stimulus coming first in the rank. 
Thus for each judge, the original judgments 
of similarity potentially can yield one I Scale 
from the “point of view” of each stimulus. 
The total number of I Scales that can be ob- 
tained from any one judge is therefore equal 
to the number of stimuli used in the study. 
By means of Coombs’ Unfolding Technique 
(4), we can then determine whether the I 
Scales for each judge are compatible, that is, 
whether they could all have arisen from a 
single dimension. 

Even if judges differ in their I Scales, if we 
assume that in making their comparisons they 
operate from some segment of a larger space 
in which the stimuli are located, we may pool 
all of the I Scales. Bennett’s techniques (1) 
then allow us to estimate the number of di- 


*For example, if in judging the three Stimuli A, 
B, and D, the judge decides that A and D are most 
alike and B and D least alike with respect to depth 
of interpretation, we would convert these decisions 
into rankings as follows: From the “point of view” 
of stimulus A the ranking would be ADB; from 
the “point of view” of stimulus B the ranking would 
be BAD; from the “point of view” of stimulus D 
the ranking would be DAB. 
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mensions required to account for the observed 
I Scales, and Hays’ techniques (7) may be 
applied to locate and order the stimuli in this 
larger space.° 


Study 1 


The first study was divided into two parts. 
In the first part we tried to determine whether 
10 judges could judge depth of interpretation 
unidimensionally when it was defined for 
them with only the paragraph (quoted above) 
regarding interpretation and depth. In the 
second part we wanted to determine whether 
and how the judgments and dimensionality 
would be modified when the same 10 judges 
were asked to repeat the task, but this time 
were also given as a guide the graphic rating 
scale (8). + 

Selection of stimuli. Six stimuli were se- 
lected, three from each interview, from two of 
a series of interviews involving different thera- 
pists and patients. Therapist responses in 
these interviews had been rated for depth of 
interpretation as part of another study (8). 
The stimuli were selected so that mean ratings 
varied across the range of depth of interpre- 
tation, and so that the variance of the ratings 
for five out of the six therapist responses was 
at or below the average item variance for 
the interview. 

Results. When judges had only the defini- 
tion available to them, 35 of the 60 possible 
I Scales were obtained and no judge had a 
complete set of compatible I Scales. Since a 
complete I Scale can be recovered from the 
data only when the judge’s comparisons are 
transitive, and since the comparisons will be 
transitive and his I Scales compatible only 
when he treats the concept in a unidimen- 
sional manner, it is evident that no judge 
handled the concept of depth of interpreta- 
tion unidimensionally. At least three dimen- 
sions were necessary to account for the 35 ob- 
tained I Scales, but because of the limited 
amount of data and the evident confusion of 
the judges, we could not order the stimuli on 
these dimensions. 

Where both the definition and the graphic 


5An outline of the method and a step-by-step 
illustration of the procedure as applied to our data 
have been prepared by Dr. Hays and will be made 
available on request. 
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rating scale were available to the judges, there 
were 53 complete I Scales out of the possible 
60. The data of three judges now yielded 
complete and compatible sets of I Scales. In 
analyzing the pooled data, three dimensions 
again were required, indicating that despite 
the addition of pretested points on the scale 
to the careful definition of depth of interpre- 
tation, the judges did not deal with the 
excerpts as though depth of interpretation 
were a single continuum. 

The stimuli on the first dimension could be 
clearly ordered and inspection of them al- 
lowed us confidently to name the dimension 
depth of interpretation. The six excerpts could 
be ordered on the basis of the means of the 
ratings that had been obtained in the study 
referred to previously. The rank orders ar- 
rived at in these two diverse fashions corre- 
lated .89. 


Study 2 


Was it the improved frame of reference 
supplied by the graphic scale, or prior prac- 
tice on the task which led to the improved 
results on Part 2 of the study reported above? 
To investigate this, a second study was initi- 
ated using the same six excerpts and 10 new 
judges. These judges did the task only once 
using both the definition of depth of inter- 

retation and the scale as a frame of refer- 
ence. In this study 45 out of 60 possible I 
Scales were recovered, suggesting that the 
scale items did improve the frame of reference, 
but that practice probably helped too. Three 
dimensions were required. The stimuli could 
be ordered on the first and second dimensions, 
but only the polar stimuli of the third dimen- 
sion could be ascertained. The first dimension 
was clearly depth of interpretation. 

To supply more material, the 53 I Scales of 
Part 2, Study 1, and the 45 I Scales of Study 
2 were pooled. Again depth of interpretation 
emerged sharply as the first dimension, the 
rank-order correlation with the ordering based 
on the average of the previous rating data 
being .94. The third dimension still could not 
be identified, but the ordering of stimuli on 
the second dimension suggested either of two 
interpretations. Judges could be reacting ac- 
cording to the degree of emotionality they 
anticipated the therapist’s response would 
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arouse in the patient (hereafter referred to as 
emotionality); or judges could be reacting to 
the degree of ambiguity of the intent of the 
therapist’s response (hereafter referred to as 
ambiguity). 


Study 3 


If the secondary factors could be specified 
more accurately and if they could be demon- 
strated to be general rather than specific to 
the six stimuli we happened to use, then 
judges could be trained to recognize and cor- 
rect for the interference of such variables 
with judgments of depth of interpretation. 
Accordingly, a third study was run. 

Here judgments were again made with re- 
spect to depth of interpretation. However, 
since we were interested primarily in clarify- 
ing the meaning of the second dimension, 
stimuli were selected so as to maximize its 
possible influence. An important proof of the 
validity of our interpretation of the second 
factor would be to find that this second factor 
now becomes the primary source of the order- 
ing of stimuli. 

Selection of stimuli. Three stimuli from 
those previously studied were included. They 
were the first, middle, and last stimuli on the 
second dimension. If the same ordering of 
these “marker” stimuli was not maintained in 
the present study, we would be forced to re- 
ject the hypothesis that a general secondary 
dimension operated. 

In addition to the three “markers,” four 
new stimuli were chosen in the following 
manner: from three new interviews we selected 
ten stimuli, which on the basis of inspection 
seemed to represent various degrees of both 
emotionality and ambiguity. These ten stimuli 
and the three “marker” stimuli were ranked 
for emotionality and ambiguity by six project 
staff members. The four stimuli were chosen 
on the basis of the mean rankings, so that 
ambiguity and emotionality would be inde- 
pendently distributed, and so that a range 
would be represented on both these variables. 
The project staff also ranked the final seven 
stimuli for depth of interpretation. All pos- 
sible combinations of the seven stimuli, taken 
three at a time, provided 35 triads for judg- 
ment. For each judge there were potentially 
seven I Scales. 
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Judges. Only five new judges were avail- 
able. In order to provide sufficient judgments 
about the relations among stimuli, the entire 
series of judgments was made twice. Between 
each set of judgments on these seven stimuli, 
the judges made similar judgments on another 
group of stimuli discussed in Study 4 below. 
By this procedure we attempted to maximize 
independence between the two sets of judg- 
ments. 

Results. Forty-five out of the 70 possible 
I Scales were obtained. Three dimensions 
were required to account for these data. The 
rankings by the project staff provide criteria 
for determining whether any of these three 
dimensions could be interpreted as represent- 
ing either emotionality, ambiguity, or depth 
of interpretation. Although we expected depth 
of interpretation to show up less strongly in 
stimuli chosen for the purpose of carrying the 
secondary qualities, again the first dimension 
was depth of interpretation, correlating (rho) 
.93 with the pooled rankings. The ordering of 
the stimuli on the second and third dimen- 
sions was not consistent with the criteria for 
either emotionality or ambiguity. Inspection 
of the orderings gave no clue for interpreta- 
tion of these secondary dimensions. We must 
conclude that the secondary dimensions op- 
erating in judgments of depth of interpreta- 
tion were specific to the judges and the stimuli 
studied. 


Study 4 


To test the generality of the depth of inter- 
pretation dimensions, the first, middle, and 
last stimuli from the first dimension (depth of 
interpretation) of Studies 1 and 2 were used 
as “markers,”’. in conjunction with four new 
stimuli. These new stimuli were selected in 
the following manner: A sample of seven was 
taken from the three additional interviews 
used in Study 3; the stimuli were chosen 
initially because they seemed to represent 
the depth of interpretation dimension. These 
seven stimuli plus the three “markers” were 
ranked for depth of interpretation by project 
staff members. The four new stimuli were 
then selected on the basis of their mean rank- 
ings varying across the range of depth of in- 
terpretation. The seven stimuli, three “mark- 
ers” plus four new stimuli, provide 35 triads 
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for judgment and seven potential I Scales for 
each judge. 

Judges. The five judges of Study 3 went 
through the series of triads twice. As noted 
above, to maximize the independence of these 
repeated judgments, work on the stimuli of 
Study 3 intervened between the two series. 

Results. Forty-five out of the 70 possible 
I Scales were obtained. Two dimensions were 
required to account for these data. Except 
for one of the stimuli, a single dimension would 
have been sufficient. The first dimension was 
clearly depth of interpretation and agreed 
(rho) .89 with the order of the pooled rank- 
ing data. Agreement with the order of the 
original “markers” was perfect. 


Discussion 


The researches reported have demonstrated 
the applicability of a method of data collec- 
tion and analysis to the study of a variable 
in psychotherapy—depth of interpretation. 

We had been concerned with the possibility 
that the concept of depth of interpretation, 
despite the importance attributed to it in 
theoretical thinking, might, in fact, have only 
limited utility. However in each of the four 
studies, involving a sample of stimuli from 
five interviews with different patients and 
therapists, depth of interpretation was con- 
sistently the primary dimension. Although the 
method of analysis is not sufficiently devel- 
oped to allow for the quantification of vari- 
ance, the logic underlying it permits us to 
state with confidence that more variability in 
our data can be attributed to the depth of 
interpretation dimension than to any of the 
other dimensions operating. 

We had also been concerned with the low 
interjudge reliabilities observed when we at- 
tempted to measure the depth of interpreta- 
tion of each therapist response with a care- 
fully developed rating scale (8). One possible 
source of unreliability may be that the use 
of the rating scale forced judges to treat as 
unidimensional a complex aspect of interview 
material involving several dimensions. How- 
ever, despite the extremely complex nature of 
the material rated, there does seem to be a 
significant area of agreement between judges. 
The high correlations between the recovered 
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depth of interpretation dimensions with 
pooled ratings or rankings by judges, indicate 
that ratings may be employed despite low in- 
dividual reliabilities. That is, in using ratings 
of depth of interpretation, it may be possible 
to overcome problems of low reliability of 
single judgments on this attribute by utilizing 
average ratings of several judges. Since depth 
of interpretation, the dimension in which we 
are interested, contributes most strongly to 
the judgments, pooling judgments may maxi- 
mize its contribution and reduce the effect of 
secondary factors, specific to the ratings of 
individual judges. 

The experimental method outlines a pro- 
cedure which can be applied to the study of 
many of the complex variables discussed in 
psychotherapeutic and clinical research. The 
assumption of a single dimension, implicit in 
the construction of a rating scale, often masks 
the fact that additional factors are influencing 
a phenomenon. Lack of awareness of such 
factors often leads to experimental results 
which cannot be clearly interpreted. The type 
of research we have reported allows the ex- 
perimenter to ascertain whether his variable 
is, in fact, a unidimensional attribute, and, if 
not, under what conditions the powerful and 
convenient rating scale technique may legiti- 
mately be applied. 


Summary 


Four studies of the dimensional character- 
istics of a psychotherapeutic variable, depth 
of interpretation, were reported. Depth of in- 
terpretation was shown to be treated by judges 
not as a unitary dimension but as at least 
three dimensions: The primary dimension 
consistently was depth of interpretation; the 
secondary dimensions varied depending on 
the judges and stimuli used. The implications 
of these results for research with complex 
psychological variables were discussed. 


Received May 9, 1955. 
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Rating Personal Adjustment Through an Analysis 
of Social Reinforcements 


Vaughn J. Crandall and Ursula Bellugi 
The Fels Research Institute for The Study of Human Development 


Methods for assessing adjustmental aspects 
of behavior have often been unrelated to 
any consistent personality theory. As a re- 
sult, behavior sampled by these methods has 
frequently been analyzed according to a 
hodgepodge of empirically derived, and theo- 
retically unrelated, adjustment indicators. 
Current “sign approach” methods for assess- 
ing adjustment reflected in Rorschach re- 
sponses are a case in point. Such empirically 
derived indicators of adjustment can seldom 
be used to analyze behavior evoked by as- 
sessment techniques other than those for 
which they were devised. The aim of the 
present study was to deal with adjustmental 
aspects of behavior from the point of view of 
one theory of personality, and to develop a 
method of analysis within this context which 
might be applied to behavior sampled by a 
variety of personality assessment techniques. 

The method of analysis developed in the 
study derives its theoretical basis from the 
Social Learning Theory of Personality of 
Rotter and his students. This theory contains 
a construct of freedom of movement which 
has been defined as, “The mean expectancy 
of obtaining positive satisfactions as a re- 
sult of a set of related behaviors directed to- 
ward the accomplishment of a group of func- 
tionally related reinforcements ... ” (2, p. 
194). The construct of freedom of movement 
is an abstraction concerning the adjustmental 
aspects of an individual’s behavior in relation 
to his specific needs or social goals. Inferences 
concerning an individual’s over-all freedom 
of movement would, of course, necessitate 
evaluating an individual’s behavior in rela- 
tion to all his major needs or goals. This 
could be done by ascertaining the relative 
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frequency with which his past behavior di- 
rected toward these goals had resulted in 
success and satisfaction or failure and dis- 
satisfaction. A complete study of an indi- 
vidual’s past history of social reinforcements 
would, of course, be impossible. It is feasible, 
however, to obtain samples of his current be- 
havior and to arrive at an estimate of his 
present freedom of movement through the 
analysis of these behavior samples. The So- 
cial Reinforcement Index (SRI) of the pres- 


ent study was developed with this aim in 
view. 


Method 
Subjects 


The Ss of the study were 87 mothers of 
children enrolled in a longitudinal study of 
maternal behavior and child development at 
the Fels Research Institute for the Study of 
Human Development. Most of the Ss were 
members of the middle socioeconomic class. 
Their distribution on the Index of Status 
Characteristics of Warner et al. (3) was: 
upper-lower class, 17%; lower-middle class, 
59%; upper-middle class, 20%; and lower- 
upper class, 4%. The Ss were more intelligent 
and better educated than national averages. 
Their median Wechsler-Bellevue full-scale IQ 
was 123. Concerning their educational back- 
ground, 7 per cent had not completed high 
school, 29 per cent were high school gradu- 
ates, and 64 per cent had attended college. 


Protocols 


While the SRI method of analysis is po- 
tentially useful for the assessment of free- 
dom of movement reflected in behavior sam- 
pled by a variety of personality assessment 
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techniques, the present study was limited to 
the investigation of its applicability for the 
analysis of two separate kinds of protocols: 
(a) a clinical psychologist’s descriptions of 
the characteristic behavior of the Ss in vari- 
ous areas of their life, and (5) interviews with 
Ss concerning their behavior in these areas. 
The former protocols will be called observer 
descriptions and the latter will be designated 
subject interviews. 

The observer descriptions were obtained 
from a clinical psychologist, who, as a Fels 
Home Visitor, had regularly visited the Ss in 
their homes at least twice a year as part of a 
continuous Fels Home Visit Program investi- 
gating maternal behavior and child develop- 
ment (1). In addition, the Home Visitor had 
observed and interviewed the Ss from time to 
time at the Fels Institute. The observer de- 
scriptions, based on the Home Visitor’s total 
knowledge of each S, covered the following 
areas: (a) general methods of dealing with 
day-to-day situations, (5) housework and 
home-centered activities, (c) maternal be- 
havior, (d) marital relationship, (e) inter- 
actions with friends, and (f) social and com- 
munity activities. These descriptions were 
spontaneous verbal reports told to the senior 
author by the Home Visitor without previous 
preparation. They were electrically recorded 
and typed for subsequent SRi analysis. They 
averaged just under five hundred words in 
length. 

The subject interviews were conducted by a 
second clinical psychologist, also a Fels Home 
Visitor, who individually interviewed 41 Ss 
at the Fels Institutet These interviews cov- 
ered the same areas of the Ss’ activities as 
did the observer descriptions. During the in- 
terviews standard questions were employed, 
designed to elicit not only the S’s descrip- 
tions of experiences in each area, but also 
her satisfactions and dissatisfactions with 
these experiences. The interviews averaged 
one and one-quarter hours in length with the 
Ss’ verbalizations during the interviews av- 
eraging approximately seven thousand words. 


1 The authors would like to express their apprecia- 
tion to Fels Home Visitors, Dr. Joan Lasko and 
Mrs. Anne Preston, for their participation in the 
study. 


These interviews were also electrically re- 
corded and typed for subsequent SRI analysis. 


Social Reinforcement Index Ratings 


While the SRI method of analysis results 
in a global estimate of an S’s over-all freedom 
of movement, the ratings upon which this 
estimate is based are not global ones. These 
were purposefully avoided. Global ratings 
have the mixed biessing of allowing the rater 
to base his judgments on the total context 
of the situation as he views it. He is free to 
attend selectively to certain behavioral refer- 
ents and to minimize or ignore others. He 
weighs the importance of a given instance of 
behavior as he sees fit. Such ratings are thus 
dependent upon the rater’s personal system 
of “clinical beta weights.” The moderate to 
poor interrater agreement frequently obtained 
with global ratings may sometimes have been 
the result of the individualistic nature of this 
process. 

One potential answer to this problem, and 
the one used in SRI analysis, is the utiliza- 
tion of specific, relatively small, yet psycho- 
logically meaningful, rating units. With such 
a method, the rater is not free to base his 
final judgment on his idiosyncratic decisions 
concerning the relative importance of various 
aspects of the behavior he is rating. Rather, 
his sole task is to note when scorable behav- 
ior occurs, and to categorize the nature of this 
behavior. Such a process simplifies the rater’s 
task, yet still allows him to make psycho- 
logically meaningful judgments. 

The scoring unit of the SRI was defined as, 
“any reported attitude or behavior of the S 
or of persons with whom he interacts which 
indicates the effect of this experience on the 
S.” The unit was scored positive when it con- 
tained an experience which was judged satis- 
fying or rewarding for the S; the unit was 
scored negative when the social reinforcement 
was judged to be an unsatisfying, nonreward- 
ing or frustrating one. In general, raters were 
instructed to rate only behavioral referents 
from which they could directly infer satisfac- 
tion or dissatisfaction on the part of the S 
being rated. However, occasionally, when the 
satisfaction or dissatisfaction of the S was 
not directly inferable, it was possible to judge 
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the expected effect of the experience upon the 
S. On these occasions, the raters judged the 
potential positive or negative social reinforce- 
ment inherent in such behavior in terms of 
the usual reinforcement value for this kind of 
behavior within the S’s cultural groups.” 

The SRI was developed to simplify the rat- 
ing process and should, if it were successful 
in its aim, be usable by raters with little psy- 
chological training and experience as well as 
by experienced clinicians. To test this, psy- 
chologically inexperienced persons were used 
as SRI raters in the present study. The ob- 
server descriptions and the subject interviews 
were rated by two senior undergraduate stu- 
dents majoring in psychology.* These raters 
had no contact with, or knowledge of, the Ss 
whose protocols they rated. A training session 
preceded the actual rating of the protocols. 
During the training session, the SRI method 
of analysis was explained to the raters by the 
senior author, the raters made practice ratings 
on a half-dozen protocols not used in the 
study, and these practice ratings were dis- 
cussed in relation to original scoring instruc- 
tions. The raters then independently rated the 
protocols of the study. The final SRI score 
for each protocol was obtained by subtract- 
ing the sum of the units scored for negative 
reinforcement from the sum of the positive 
reinforcement units, and dividing this by the 
total number of scored units. 


Results 


The SRI ratings of the protocols were com- 
pared with Home Visitors’ ratings of the Ss’ 
over-all freedom of movement. The first Home 
Visitor rated the 72 Ss she had discussed in 
the observer descriptions, and the second 
Home Visitor rated the 41 Ss she had inter- 
viewed. Each Home Visitor based her ratings 
on her total knowledge of each S gained from 
all observations and interviews of that S in 
her home and at the Fels Institute. These 
ratings were thus based on relatively exten- 
sive knowledge and made by experienced 


2 Mimeograph copies of examples of reinforcement 
units scored for positive and negative reinforcement 
in the observer descriptions and subject interviews 
may be obtained from the authors. 

8 The Social Reinforcement Index raters were the 
junior author and Miss Barbara Fisher. 


clinical psychologists. The SRI ratings, in 
contrast, were made by undergraduate stu- 
dents with little psychological experience, and 
were based solely on the Ss’ protocols. 

The junior author’s SRI scores of the ob- 
server description protocols were correlated 
with the first Home Visitor’s ratings of the 
Ss she had described.* An r of .£81 was ob- 
tained, indicating high agreement between 
SRI analyses of the descriptions and the 
Home Visitor’s judgments of the Ss’ over-all 
freedom of movement. The junior author’s 
SRI scores of the 41 Ss’ verbalizations dur- 
ing the subject interviews were correlated 
with the second Home Visitor’s ratings. The 
r was .67; SRI analyses of the Ss’ responses 
during a single interview bore considerable 
relationship to the Ss’ observed freedom of 
movement as rated by the second Home 
Visitor. 

Information concerning the reliability of 
the SRI ratings was also obtained. Forty of 
the observer descriptions were randomly 
picked, and the two SRI raters’ scores for 
these protocols were correlated. The correla- 
tion was .96. The correlation between the SRI 
scoring of the two raters on the 41 subject 
interview protocols was .87. Intrarater reli- 
ability of SRI scoring was obtained by com- 
paring the original ratings of the junior au- 
thor with reratings done one month later. 
The correlations were .96 for the observer 
descriptions and .92 for the subject inter- 
views. 


Discussion 


The general results of the study are en- 
couraging. The SRI analyses of the protocols 
resulted in reasonably reliable estimates of 
the Ss’ over-all freedom of movement. In 
addition, even with psychologically inexperi- 
enced raters, inter- and intrarater agreement 
was high. These conclusions, of course, must 
be limited to the kinds of protocols analyzed 
in the study. While the SRI appears to be a 
fruitful method for analyzing such diverse 
data as TAT stories, incomplete sentence re- 
sponses, autobiographical material, or time 


* All correlations reported are Pearson r’s. Distri- 
butions of scores upon which these correlations were 
based were all normal, and all relationships were 
rectilinear by inspection. 
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samples of ongoing social behavior, its ap- 
plicability for these data must be assessed in 
future research. Finally, it should be pointed 
out that the use of the SRI is not necessarily 
restricted to the evaluation of over-all free- 
dom of movement as in the present study. 
It may also prove useful, both clinically and 
experimentally, for the assessment of freedom 
of movement in respect to specific needs, 
value areas or social roles. 


Summary 


The aim of the study was to deal with ad- 
justmental aspects of behavior from the point 
of view of one theory of personality, and to 
develop a method of analysis within this con- 
text which might be applied to behavior sam- 
pled by a variety of personality techniques. 
The Social Reinforcement Index was devel- 
oped for this purpose. 

The Social Reinforcement Index method of 
analysis was employed by two raters to ana- 
lyze independently two kinds of protocols: 


(a) a psychologist’s descriptions of Ss’ be- 
havior in various areas of their life, and (5) 
recorded interviews with Ss concerning these 
areas. The Social Reinforcement Index scores 
derived solely from the ratings of these pro- 
tocols by relatively inexperienced raters were 
compared with ratings of the Ss’ freedom of 
movement by psychologists who had exten- 
sive experience interviewing and observing the 
Ss. Correlations were high. Social Reinforce- 
ment Index ratings also had high interrater 
and intrarater reliability. 


Received May 13, 1955. 
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The purpose of this paper is to present a 
method for the delineation of an individual’s 
conceptual structure. It has long been an ac- 
cepted tenet of projective psychology that 
each individual imposes his own unique struc- 
ture on the world about him. Accepting this 
as a basic assumption, one is next required to 
tease out the dimensions utilized by the indi- 
vidual in this structuration. This is a com- 
mon goal of the various extant projective 
techniques. But whereas in these techniques 
the final delineation rests rather heavily upon 
an inferential structure which may vary 
widely from one test analyst to another, it is 
proposed that the technique to be described 
below reduces such inferential variability to 
a minimum. 


The Role Construct Repertory Test and 
Underlying Assumptions 


The following assumptions are believed to 
be the minimum necessary for the derivation 
of the present technique: 

a. For each individual there exists a uni- 
verse of persons which constitutes his social 
environment. 

5. Each individual possesses a repertoire of 
constructs which is relatively stable over a 
period of time, and which he utilizes in struc- 
turing his social environment. 

c. Constructs contained in a given indi- 
vidual’s repertoire bear a relationship to each 
other such that they may be ordered to cer- 


1 This study draws heavily upon the Personal Con- 
struct Theory of George A. Kelly (2), although the 
authors accept full responsibility for the assumptions 
and present form of this paper. The data upon which 
this article is based were collected while the authors 
were in residence at the Ohio State University. 
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tain basic dimensions which define the pa- 
rameters of his construct repertoire. 

d. The structure of an individual’s social 
environment may be duplicated by an ob- 
server through knowledge of the parameters 
of his construct repertoire. 

An individual’s constructs are viewed in 
much the same fashion as the constructs of 
the scientist. They are the ways in which or- 
der is brought to the unordered, and the bases 
upon which predictions about outcomes of 
behavior are made. Defining a construct as 
a way in which two or more things are alike 
and at the same time different from one or 
more other things, Kelly (2) has developed 
the Role Construct Repertory Test (RCRT) 
as a means of eliciting an individual’s con- 
struct repertoire. 

One form of the RCRT consists of two 
parts: In the first part there is a list of 15 
role titles such as: Your mother or the per- 
son that played the part of your mother; A 
person who for some unknown reason dis- 
liked you; A teacher for whom you had a 
great deal of respect; etc. For each of these 
titles S writes down one name and may not 
use the same name twice. In the second part 
S is presented with 15 combinations of these 
people taken three at a time and is asked to 
indicate in a word or phrase how any two in 
each triad are the same and at the same time 
different from the third. This constitutes a 
construct. He is then asked to indicate what 
he feels to be the opposite of the construct 
he has just listed. Thus there are elicited 15 
constructs and their opposites. Using college 
students and a slightly different form of the 
RCRT, Hunt (1) obtained a test-retest agree- 
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ment of 70 per cent on constructs used, in- 
dicating some generality for the constructs 
elicited. 

Having obtained a sample of the individu- 
al’s construct repertoire the next problem in- 
volves ordering these to certain dimensions 
which may be said to represent the parame- 
ters of his construct repertoire. It will be ap- 
parent to the reader at this point that this is 
a similar problem to that found in psycho- 
metrics, where one has a large number of 
scores on various tests and wishes to reduce 
these to more meaningful dimensions which 
will explain whatever relationship exists be- 
tween these scores. The method of choice in 
psychometrics appears to be factor analysis 
and this method has been applied to the pres- 
ent study. Thus we propose to present the 
application of factor analysis to an individu- 
al’s RCRT protocol as a means of determin- 
ing the manner in which he structures his 
social environment. 

Although some similarity to Osgood’s method 
of deriving his Semantic Differential (4) will 
be noted, one important difference is that the 
dimensions arrived at by the present approach 
are unique for each individual, whereas the 
scales contained in the Semantic Differential 
are nomothetic in nature, having been de- 
rived from a large group of Ss. While Os- 
good’s procedure permits the specification of 


the meaning of concepts in a multidimensional 
space defined by his instrument, we would 
claim that the factor analysis of an RCRT 
protocol permits, first, the specification of a 
multidimensional conceptual space unique for 
a given individual, and second, the location 
of persons in this space. 


Procedure 


The RCRT as described above was admin- 
istered to each of four Ss.* The resulting con- 
structs in each protocol were then designated 
as five-point rating scales on which S was to 
rate each of the 15 individuals named in the 
first part of the test. The low end of the scale 
meant that the construct was very typical of 
the person, while the high end meant that the 
contrasting or opposite construct was very 
typical. In order to reduce any halo effect, 
the S was asked to consider one construct at 
a time, rating all of the individuals in suc- 
cession. 

One can thus conceive of the constructs as 
tests on which there are as many scores as 
there are people rated. It is recognized that 
the correlations which result from this pro- 
cedure will be affected by the fact that the 
individuals rated on a particular construct 
may have also entered into the formulation 


2In order to conserve space only two of the four 
cases are presented. 


Table 1 


Constructs and Orthogonal Factor Loadings for Case I 











Constructs 


Contrasting construct I II 





Factor loadings 
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Construct 
1, unassuming pretentious —19 90 27 —09 93 
2. creative uncreative; static 24 27 90 12 95 
3. rigid flexible 03 —55 —55 —53 89 
4. original; daring unoriginal; conventional 03 —0O1 40 58 50 
5. sincere; unaffected insincere; affected —19 91 33 07 98 
6. aggressive submissive 13 —58 —59 30 79 
7. sexual asexual 84 00 03 10 72 
8. intelligent stupid 06 29 92 02 93 
9. refined crude 00 87 31 19 89 
10. intellectual non-intellectual 15 12 89 25 89 
11. non-productive productive 05 —02 —62 —25 45 
12. liberal conservative 40 —17 62 51 83 
13. ascetic sensual —75 15 04 —52 86 
14. introjected values experience-derived values —07 —11 —54 —33 42 


. socially inept socially skilled 
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of that construct. It was apparent, however, 
that the correlations were not uniformly af- 
fected since they range from high positive to 
high negative. 

The intercorrelations between constructs for 
each S were factor analyzed using the Thur- 
stone multiple group method. Factors were 
rotated to orthogonal simple structure. The 
loadings were then adjusted by the Wherry- 
iterative procedure (5) so that no residuals 
in any of the tables exceeded + .15. Ninety- 
five per cent of the residuals in each of the 
tables were between + .10. Loadings of .30 
or greater were considered significant. 


Results 


Since the purpose of this paper is primarily 
methodological and heuristic, no data will be 
presented other than the constructs with their 
factor loadings and a brief description of the 
S. 

Case I. A 31-year-old, single male gradu- 
ate student, considered a “capable” person 
by his associates. He comes from one of the 
“old families’ of the South and has de- 
scribed his home as somewhat puritanical. 
He is much concerned with problems of crea- 
tivity and personal freedom. 

Table 1 presents his construct repertoire 
and orthogonal factor loadings. We consider 
these factors to be an operational definition 
of this S’s mode of structuring his social en- 
vironment. To attempt to give names to these 
factors would appear to be of dubious value, 
while inspecting each factor for those con- 
structs with significant loadings will provide 
us with some insight into the ways in which 
the S attempts to make sense out of his 
world.’ 

Factor I is clearly concerned with sexu- 
ality, with the probable identification of 
sexual expression with liberality and sexual 
repression with conservatism. One might sus- 
pect here that since he describes his family 
as puritanical, he would perceive them as 


8 While no attempt is made in this article to es- 
tablish the validity status of the RCRT or our 
method of analysis, it might be mentioned paren- 
thetically that a faculty member in the department 
in which this S was a student identified the student 
immediately upon reading this analysis of his proto- 
col, although he had no knowledge that the student 
had taken part in the study. 
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conservative and hence not condoning sexual 
expression. The fact that at 31 he is not mar- 
ried may indicate a closer tie to his family 
than he would be willing to admit. These of 
course are in the nature of low order hy- 
potheses which such a protocol, taken to- 
gether with other information, might generate. 

Factor II appears to be an affective or atti- 
tudinal dimension by means of which people 
are seen as either unassuming or pretentious 
with the further implication that persons who 
are unassuming, submissive, and refined are 
more to be trusted and relied upon than those 
for whom the opposite might be true. One 
would be led to predict here, among other 
things, that this person might have difficulty 
relating to persons in certain occupations or 
social strata where a premium is placed on 
aggressiveness. Similarly he may be an un- 
duly submissive individual because of the 
negative implications of aggressiveness. How- 
ever, he would probably see this more as be- 
ing flexible than submissive. 

Factor III seems to reflect his concern with 
intellectual freedom and creativity. Interest- 
ingly, we also find that the constructs “aggres- 
sive-submissive” and “refined—crude” have 
significant loadings on this factor. Here it 
appears that creativity and intelligence also 
carry certain emotional connotations for him 
and that, perhaps, he is not able or willing to 
accept a person purely on the basis of his ac- 
complishments in a given area, but also looks 
for certain nonintellectual components as well. 

Factor IV may be the dimension by which 
he discriminates between people who are like 
his family and those who are not. If we were 
interested in transference relationship prob- 
lems, we might view this factor as mediating 
the transference. We might expect, if this 
were true, that having placed a nonfamily 
figure on the same end of this dimension as 
his family, he would then transfer to them 
many of the attitudes he holds in regard to 
his family. Such information would be in- 
valuable to a clinician working with this 
person. 

Case II. By way of contrast we present the 
factor analysis of the RCRT protocol of a 
48-year-old female graduate student, mar- 
ried, and mother of two children. At the time 
of this study she described herself as con- 
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Table 2 
Constructs and Orthogonal Loadings for Case II 














Constructs Factor loadings 

Construct Contrasting construct I II Ill IV h? 

1. open mind, growing closed mind, “set in ways” 91 25 00 30 98 

2. plastic, vital religiously distorted, inadequate concepts 50 72 —05 —10 89 

3. understanding inward looking, self-seeing only 96 —04 —14 06 95 

4. confused on firm emotional foundation —18 12 80 00 69 

5. vital in thinking conventional in thinking 84 21 05 42 93 

6. gentle severe, critical 88 26 12 —02 86 

7. masculine dominating leading, not driving —93 —10 03 —02 88 

8. likeable irritating 69 49 —03 —08 72 

9. devoted to a purpose takes life as it comes 32 65 02 00 41 

10. not satisfied with taking other does no individual thinking 63 23 20 53 77 

people’s thinking 

11. can talk about philosophical ideas not interested in philosophical ideas 73 57 08 26 94 
12. broader outlook limited outlook 85 32 00 41 99 
13. stolid high strung —23 00 -—71 —04 56 
14. resentful accepting —97 03 02 02 94 
15. don’t see me as a person sympathetic —91 -—03 —07 10 84 





fused about the future and considering a 
fairly radical vocational change. Without go- 
ing into the protocol in the same detail as 
the previous one, one is immediately struck 
by her concern about being accepted, about 
a Weltanschauung, and about emotional sta- 
bility. If one were working with this woman 
clinically, he would be particularly interested 
in the relationship between the construct 
“masculine dominating—leading, not driving” 
and the rest of the factor on which it is 
loaded. Does she indeed see males as domi- 
nating, “set in ways,” and generally unthink- 
ing? If so, how does this enter into her pres- 
ent difficulty? Does she see herself frustrated 
because this is essentially “a man’s world’? 
These are just a few of the questions and hy- 
potheses which a clinician would raise on the 
basis of this protocol. 


Discussion 


We have presented what we believe to be 
a useful and reliable approach to the “map- 
ping” of the individual cognitive structure. 
In its present form it would be impractical 
except for research purposes. However, Kelly 
(3) is developing a nonparametric equivalent 
to factor analysis which would permit one to 
factor a protocol in about one hour’s time. 
We would like in this section to sketch some 


of the research possibilities which we believe 
are opened up by this method. 

Not losing sight of the fact that the RCRT 
is a sorting test akin to other concept-forma- 
tion tests, one immediately wonders what the 
qualitative relationships might be between 
the constructs elicited by it and the con- 
cepts elicited by these tests. More specifically, 
since here we obtain a picture of the relation- 
ship between constructs, there are certain 
questions amenable to investigation for the 
first time. For example: What is the signifi- 
cance of the number of factors derived? Is 
there a relationship between number of fac- 
tors derived and complexity of cognitive 
structure? If so, what are the behavioral cor- 
relates of complexity of cognitive structure? 
Does the number of constructs having signifi- 
cant loadings on a given factor have any sig- 
nificance? One might suspect that the larger 
the number of constructs significantly loaded 
on a given factor, the more important that 
dimension is for the individual. Does fac- 
torial structure change along developmental 
lines? 

Aside from the content itself, there are cer- 
tain implications for clinical research. Since 
the factors derived are to a good extent the 
result of functional relationships which the 
individual himself has imposed upon his con- 
structs, one wonders at the extent to which 
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he is aware of this functional relationship. 
As a corollary, one might be interested in the 
correlates of accuracy of awareness of func- 
tional relationships between constructs. Such 
relationships could be studied by having the 
individual attempt to sort his constructs into 
as many groups as there were factors, and 
determining the correspondence between his 
grouping and the factor grouping. Corre- 
spondence could be construed as one form 
of insight and might be expected to vary with 
personal adjustment, age, success or failure 
in psychotherapy, etc. The clinician, inter- 
ested as he is in problems of interpersonal 
relationship, could easily investigate such 
areas as transference, identification, etc., by 
means of an inverse analysis of the same 
protocol using the role titles as the variates. 
Here again it would be interesting to study 
developmental trends. 

Other research opportunities present them- 
selves in the fields of social and industrial 
psychology where one might be interested in, 
among other things, the commonality of con- 
structs among members of a given group, the 
relationship between nature and structure of 
cognitive structure and leadership ability, etc. 
The problem of communication could be ap- 
proached through the study of the personal 
constructs of the communicators. From the 
standpoint of communication, this technique 
provides a means of reproducing each indi- 
vidual’s personal coding scheme. One might 
study such problems as ability to receive and 
transmit information as a function of the 
content and structure of individual coding 
systems. 

One last point should be made with respect 
to the problem of validity. In one sense there 
is no such thing as an invalid test, since all 
tests measure something, and it becomes a 
question of semantics whether the right name 


has been applied to it; the more important 
question as we see it is pertinence. The re- 
sponses of an individual on the RCRT con- 
stitute a phenomenon; by means of factor 
analysis we impose a certain structure upon 
this phenomenon, and now the problem be- 
comes one of the pertinence of this phe- 
nomenon and this structure. The entire body 
of this discussion has been devoted to the 
description of a program which we believe 
would yield an answer to this question. To 
the extent that the information yielded by 
the RCRT and its factor analysis is found 
to be functionally related to other psycho- 
logical phenomena, the technique has perti- 
nence. In this spirit we offer the technique 
to the psychological public. 


Summary 


A method is described whereby factor ana- 
lytic techniques may be applied to the con- 
cepts formed by an individual on a sorting 
test called the Role Construct Repertory 
Test. Two cases are presented for illustrative 
purposes and the research implications of this 
method are discussed. 


Received May 9, 1955. 
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Efficiency of Attitudes, Fantasies, and Life History 
Data in Predicting Observed Behavior’ 


George W. Fairweather, Louis J. Moran, and 
Robert B. Morton 
VA Hospital, Houston, Texas 


This study investigates the efficiency with 
which three frequently used sources of in- 
formation, viz., attitudes, fantasies, and life 
histories, predict tuberculous patients’ overt 
ward behavior. 

In structured interviews, each of 140 pa- 
tients responded to 132 life history items 
and 45 attitude items. In addition, ten TAT 
cards were administered to 39 of the 140 pa- 
tients. Concurrently, all patients were rated 
by two nurses and two aides on 64 ward be- 
havior items. 

From this pool of items, three ward be- 
havior scales, three attitude scales, and two 
life history scales were constructed. The three 
ward behavior scales and the three attitude 
scales concerned adjustment to (a) regula- 
tions, (b) personnel, and (c) peers. The two 
life history scales concerned developmental 
adjustment to (a) control and (6) peers. 
Each response to every item in all scales was 
given an a priori adaptation score so that the 
higher the adaptive significance of a response 
the higher the score. The total score for any 
scale was the sum of the item values compris- 
ing that scale. 

The 39 TAT protocols were scored, by fre- 
quency count, on the following fantasy di- 
mensions: (a) hostility, (5) clear or weak 


1 An extended report of this study and test mate- 
rials may be obtained without charge from Robert 
B. Morton, Veterans Administration Hospital, Hous- 
ton 31, Texas, or for a fee from the American Docu- 
mentation Institute. To obtain it from the latter 
source, order Document No. 4692 from ADI Aux- 
iliary Publications Project, Photoduplication Serv- 
ice, Library of Congress, Washington 25, D. C., 
remitting in advance $1.75 for microfilm or $2.50 
for photocopies. Make checks payable to Chief, 
Photoduplication Service, Library of Congress. 
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perception of the parent, (c) aspiration, and 
(d) inactivity. In addition, 33 of the 140 
patients, who remained in the hospital 8 to 
15 months after the initial ward behavior 
ratings, were again rated on the same three 
behavior scales. 

A 12 by 12 tetrachoric correlation matrix 
was computer with the initial data. The 
matrix revealed that attitudes, life history 
data, and fantasy dimensions were not sig- 
nificantly related to the scores on the three 
ward behavior scales. However, the correla- 
tions among the three ward behavior scales 
were .66, .66, .76 (p< .001). Further, the 
follow-up ward behavior scores yielded the 
following product-moment correlations with 
the initial ward behavior scores: (a) adapta- 
tion to regulations .60 (p < .01), (6) adapta- 
tion to personnel .27 (p > .05), (c) adapta- 
tion to peers .59 (p< .01). 

These results indicate that the most effi- 
cient predictor of one aspect of current be- 
havior is another aspect of current behavior. 
For example, current adaptation to regula- 
tions predicts, with reasonable accuracy, the 
patients’ current adaptation to personnel and 
peers. Further, the significant correlations be- 
tween initial and follow-up behavior scales 
in the areas of adaptation to regulations and 
peers intimate that one aspect of behavior 
predicts, within moderate limits, subsequent 
similar behavior. More generically, the find- 
ings suggest that the closer in conceptual dis- 
tance the independent variable is to the de- 
pendent variable, the more accurately will 
the latter be predicted from the former. 


Brief Report 
Received September 12, 1955. 
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Double Alternation: A Measure of Intelligence’ 


Allen Hodges 


Southern Minnesota Mental Health Center 


Since its formulation by Hunter (5), the 
double alternation problem has been recog- 
nized as a measure of an implicit process 
designated as reasoning. The subject, whether 
animal or human, must respond in terms of 
temporal relations under conditions where no 
differential sensory cues exist. Because of its 
adaptability, versions of the double alterna- 
tion problem have been widely used in com- 
parative studies of intelligence in mammals 
(1, 4, 5, 7, 8). 

Four studies have been reported using the 
double alternation technique as a measure of 
intelligence with human subjects. With col- 
lege students as subjects, Gellermann (2) re- 
ported correlations between solution of the 
problem and chronological age and mental 
age to be .28+.11 and .58 + .09, respec- 
tively. Hunter and Bartlett (6), using chil- 
dren between the ages of 2 and 6, found a 
correlation of .86 + .03 between a number of 
trials required to solve the problem and 
chronological age; a correlation of .81 + .04 
was found between a number of trials and 
mental age. 

Stolurow and Pascal (10), using mental de- 
fectives as subjects, found double alternation 
performance to correlate .83 + .03 with meas- 
ured mental age, while no significant rela- 
tionship was found with chronological age. 
Hodges (3), using public school children, re- 
ported biserial correlations of .78 + .03 and 
.59 = .05 between solution of the double al- 
ternation problem and CA and MA. 

From these studies with human subjects, 
the relationship between measured intelli- 


1The author is indebted to Drs. G. R. Pascal, 
E. O. Milton, and E. E. Cureton of the University 
of Tennessee for advice and guidance given in com- 
pleting the dissertation upon which this article is 
based. 


gence and double alternation behavior is evi- 
dent. The purpose of this paper is to equate 
measured mental age with performance on 
the double alternation problem. Because of 
the nonverbal qualities of double alternation 
behavior, a test of this behavior could be a 
valuable addition to the clinician’s test bat- 
tery. 
Method and Procedure 

Selection of Subjects 


A stratified sample of elementary school 
children was selected.* 


The following selective criteria were used: (a) Only 
male subjects are included. (b) All subjects were ad- 
ministered the Primary Mental Abilities Test (11). 
(c) Two hundred and forty subjects with PMA 
1Q’s ranging from 80 to 120 were subdivided into 
six chronological age groups. (d) While subjects 
were not individually selected on the basis of socio- 
economic status, all subjects were drawn from school 
districts in which family earning power, educational 
level of parents and type of housing were com- 
parable. 


Table 1 contains the chronological and men- 
tal age characteristics of the six age groups 
employed. 

Table 1 


Chronological and Mental Ages of the Subjects 











Chrono- 
logical Mental 
Age age age 
group N mean o mean o 
6to7 40 6.54 31 6.21 384 
7to8 40 7.50 .28 7.59 93 
8to9 40 8.41 34 8.15 77 
9to 10 40 9.40 39 9.25 81 
10 to 11 40 10.39 36 10.41 .63 
11to12 40 11.41 .24 11.42 69 





2 All subjects were obtained from the Oak Ridge, 
Tennessee, Public Schools. 
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The Double Alternation Card Test 


Each of the 240 subjects was administered 
the Double Alternation Card Test individu- 
ally. The Card Test was introduced by J. 
McV. Hunt (9). The only materials needed 
are five ordinary playing cards and a record 
sheet. Four of the five playing cards are black 
(either spades or clubs) while the fifth is red 
(heart or a diamond). The red card is desig- 
nated as the goal card. 

The five cards are arranged out of vision 
of the subject and placed face down on the 
examining table. The goal card is placed ac- 
cording to the double alternation sequence, 
twice on the left end and twice on the right 
end (LLRR). The subject is simply in- 
structed: “Find the red card.” 

The complete sequence of LLRR or four 
complete presentations comprises one trial. 
The subject’s errors are recorded as the sub- 
ject seeks the goal card. The test is con- 
tinued for thirty trials or until the subject 
successfully solves the problem by picking 
the goai card for two successive trials. 

Six quantitative scores were derived from 
the Card Test. 


1. Success or Failure. This is a dichotomy score of 
whether the subject obtains the criterion of two cor- 
rect successive trials before a total of 30 trials is 
presented. 


3In a forthcoming article, Dr. Hunt will describe 
the Card Test in more detail. 
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2. Verbalization or Nonverbalization Score. If the 
subject successfully solves the problem, this score is 
whether the subject can verbalize the answer. With 
young children, the problem is scored as “Verbalized” 
if the child states: “You put the red card two times 
here (points correctly) and two times here (points 
correctly) .” 

3. The Number of Trials Score. This score is the 
number of trials which the subject uses before ob- 
taining the criterion of success. 

4. The Number of Total Errors Score. This score 
is the total number of errors made by the subject 
before obtaining the solution, or the total errors 
made during the 30 trials. 

5. The Errors Prior to End Concept Score. In ob- 
serving subjects confronted with the double alterna- 
tion problem, their first behavior appears to be trial 
and error. After locating the red card on the ends, 
the subject recognizes that the goal card is always 
placed in the end-most positions. From this time on, 
the majority of subjects will turn only the end cards. 
This point at which the S turns only the end cards 
for two successive trials is designated as the de- 
velopment of the concept of “endedness.” The errors 
committed prior to achieving this concept comprise 
the Errors Prior to End Concept Score. 

6. The Errors after End Concept Score. This score 
is the number of errors committed after formation 
of the end concept. 


A seventh score is obtainable which is the 
total Time Score, or the time which elapses 
between the first presentation of the cards 
and the solution to the problem. Because the 
dexterity of the examiner in handling and 
distributing the cards contributes to the time 
involved in testing, the validity of such a 
score is questionable. 


Table 2 


Correlation Summary 














ar ge omy 
gi 3:3 = SB SB. - 
Measures = ae F 5 —& 8 = 8 2 
Sp, gp 3, 2S 08 #8 @ 
5 S& es $5 Ex Ex B 
As > Zz S mS mS = 
1 2 3 4 5 6 7 
1. Solution vs. failure 
2. Verbalization vs. nonverbalization .800* 
3. Number of trials —.680* —.760* 
4. Total number of errors —.633¢ —.595T 894t 
5. Errors prior to end concept —.219¢ —.276T 344t .765 
6. Errors after end concept —.579¢ —A68T -702T 136 149 
7. Mental age 597T 536 —543t —.743 —.222 — 487 





* Tetrachoric coefficient. 
T Biserial coefficient. 
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Table 3 
Mean Scores of the Six Double Alternation Measures for Each Mental Age Group 
Mean Mean Mean 
Solution Verbalization Mean total errors errors 
or or number number prior to after end 
MA N failure nonverbalization of trials of errors end concept concept 
14 to 15 6 S Vv 5.8 18.3 14.8 3.8 
13 to 14 7 S V 9.3 20.3 16.4 8.1 
12 to 13 16 S V 11.6 29.7 14.5 12.8 
11 to 12 32 S Vv 12.4 35.8 20.2 16.3 
10 to 11 35 S NV 18.2 55.9 32.2 24.4 
9 to 10 37 S NV 19.7 54.2 22.2 25.1 
8 to9 +6 S NV 21.5 61.9 36.7 31.6 
7 to8 36 S-—/F NV 21.7 58.7 24.7 34.2 
6to7 21 F NV 26.7 87.3 47.0 41.1 
5 to 6 a F NV 30.0 99.3 39.2 60.0 
Results 


Table 2 contains the intercorrelation sum- 
mary of the six scores on the Card Test and 
the criterion mental age. 

Solution vs. Failure, Total Number of 
Errors, and Errors after End Concept Scores 
were selected as having the highest correla- 
tion with the criterion variable and lowest in- 
tercorrelations with the other variables. A 
multiple regression equation was computed 
by the Doolittle method using these three 
scores. The result was: 


Estimated mental age= .24 (X1) — .49 
(X2) — .09 (X3) +C 


Where: X1 = Solution vs. Failure Score 
(Solve = + .65, Fail = — 

.97) * 
X2 = The square root of the Total 
Number of Errors Score ® 


X3 = The square root of the Er- 
rors after End Concept 
Score ° 

C =12.15 


This equation represents the best estimate 
of mental age obtainable for the three Card 


*To determine the scores to be assigned for Solu- 
tion vs. Failure, the values corresponding to P, or 
the percentage of solution, and Q, which is the per- 
centage of failure, for the mean deviation of one tail 
of a normal distribution was computed. 

5 The square roots of these scores are necessary 
because of a square root transformation previously 
applied to the data. 


Test scores. The use of this equation is lim- 
ited, however, because of the restricted range 
of ability which it encompasses; i.e., predict- 
ing mental ages below a twelve-year level. In 
addition, the need for cross validation is evi- 
dent. When this equation is applied to a new 
group of subjects, considerable shrinkage may 
occur which would alter the predictive value 
of the multiple regression equation. 

To increase the clinical usefulness of the 
Card Test, mean scores have been compiled 
which equate scores on the Card Test’s six 
measures with mental age. These means are 
included in Table 3. 


Summary and Conclusions 


1. The purpose of this paper was to equate 
performance on the Double Alternation Card 
Test with measured mental age. Using 240 
public school males ranging in CA from 6 to 
12 years, mental age equivalents for six 
quantitative scores of the Card Test were 
computed. 

2. A multiple regression equation predict- 
ing mental age was derived. The usage of this 
equation appears limited because of the re- 
stricted mental age range which can be pre- 
dicted by means of it. 

3. Mean scores for performance on six 
measures of the Card Test, equated with 
PMA mental ages, are included as very ten- 
tative norms. 

4. Because of the nonverbal aspects of the 
Card Test plus the high correlations between 
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measures on this test and measured mental 
age, it is felt that this technique can be a 
valuable addition to the clinician’s test bat- 
tery, particularly in instances where the 
testee suffers language handicaps, aphasia, or 
speech disorders. 


Received May 13, 1955. 
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Estimating the Full Scale Score on the Wechsler 
Adult Intelligence Scale from Scores 
on Four Subtests 


Jerome E. Doppelt 
The Psychological Corporation 


Numerous articles have appeared in the lit- 
erature on abbreviated forms of the Wechsler- 
Bellevue Scale. The short forms have usually 
been based on data from samples which were 
not at all typical of the normal population. 
In 1950, McNemar (2) reported a study on 
abbreviated Wechsler-Bellevue Scales which 
was based on the data Wechsler (3) reported 
for his standardization population. McNemar 
gave the correlations between total score on 
the Wechsler-Bellevue and the best pairs, 
triads, quartets, and quintets of subtests. He 
makes the point that “there can be no doubt 
about the greater dependability of the corre- 
lations based on Wechsler’s group,” that is, 
the standardization sample for the W-B Scale, 
than on attempts with less representative 
groups. 

Since February 1955, the Wechsler Adult 
Intelligence Scale (WAIS), the revision of 
the Wechsler-Bellevue Scale, has been avail- 
able. The WAIS consists of six verbal and 
five performance subtests, and like its prede- 
cessor it yields Verbal, Performance, and Full 
Scale IQ’s according to age group. There are 
many instances when users of the WAIS 
would want a reasonably accurate estimate of 
a subject’s IQ without giving all eleven sub- 
tests of the Scale. This study is an investiga- 
tion of the effectiveness of a subgroup of 
tests in predicting the Full Scale Score, which 
is the sum of scores on the eleven tests. 

Following McNemar’s suggestion, the data 
used were those gathered in the national 
standardization of the WAIS and reported in 
the Manual (4). These data are based on 
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samples reasonably representative 
population of the United States." 

The decision as to the number and type of 
subtests to be included in the predictor group 
is, to some extent, arbitrary. A compromise 
must be made between economy of time and 
effort and accuracy of prediction. It was de- 
cided to select the group of four subtests 
which correlates highest with Full Scale 
Score. Although prediction of the Full Scale 
Score was the goal, it was felt that the best 
approach would be to select the two verbal 
subtests which are most highly correlated 
with total Verbal Score and the two perform- 
ance measures which are the best predictors 
of the total Performance Score. 

There are fifteen possible pairs of the six 
verbal subtests of the WAIS and the correla- 
tion of each pair with the Verbal Score (total 
of scores on the six tests) was computed for 
three different age groups: 18-19, 25-34, and 
45-54. The coefficients are reported, in order 
of magnitude, in Table 1. For the ten pairs of 
performance tests the correlation coefficients 
between each pair and Performance Score are 
reported for the same age groups in Table 2.’ 

Tables 1 and 2 show that many of the 
pairs of tests are highly correlated with the 
corresponding total scores and the differences 


of the 


1 Full description of the sampling procedure used 
in the collection of the data is given in the Manual 
(4, pp. 5-11). 

2 The basic data for these calculations are given in 
Tables 7-9 of the WAIS Manual (4). For greater 
accuracy in computation the intertest coefficients to 
four decimal places were used. These data were avail- 
able in the files of the publisher of the WAIS. 





Jerome E. Doppelt 


Table 1 


Correlations Between Two Verbal Tests and Total Verbal Score 























Ages 18-19 Ages 25-34 Ages 45-54 
Tests Tests r Tests r 
Inf., Voc. .940 Inf., Voc. .940 Arith., Voc. .948 
Arith., Voc. .938 Inf., Sim. .936 Inf., Comp. 941 
Sim., Voc. .938 Arith., Voc. 934 Inf., Voc. .940 
Inf., Sim. .936 Sim., Voc. 924 Inf., Arith. .939 
Comp., Voc. .933 Inf., Comp. 921 Comp., Voc. 935 
Comp., Sim. 931 Comp., Arith. 914 Arith., Sim. .933 
Inf., Arith. .930 Inf., Arith. 913 Comp., Arith. 930 
Arith., Sim. .929 Comp., Voc. 912 Sim., Voc. .930 
Inf., Comp. .925 Dig. Sp., Voc. 912 Inf., Sim. .930 
Inf., Dig. Sp. 925 Arith., Sim. S11 Dig. Sp., Voc. .928 
Sim., Dig. Sp. 914 Comp., Sim. .908 Comp., Sim. .920 
Dig. Sp., Voc. 912 Inf., Dig. Sp. .905 Inf., Dig. Sp. 918 
Comp., Arith. 911 Comp., Dig. Sp. 894 Comp., Dig. Sp. .909 
Comp., Dig. Sp. .906 Sim., Dig. Sp. 891 Sim., Dig. Sp. .903 
Arith., Dig. Sp. 881 853 Arith., Dig. Sp. 865 


Arith., Dig. Sp. 








between the coefficients for successive pairs 
in any age group are very small. The coeffi- 
cients are, of course, influenced by the fact 
that each pair of tests is included in the total 
with which it is correlated. However, the pur- 
pose here is to predict the total on the basis 
of part of the total and consequently this 
“spurious” nature of the coefficients is of no 
concern. 

In Table 1, the Information and Vocabu- 
lary subtests constitute the best pair for two 
of the age groups and the third best pair for 
one group. The pair which includes Arith- 
metic and Vocabulary is in second, third, and 


first position, respectively, for the three age 
groups. Since the difference between these 
“best” pairs is negligible, the choice of the 
pair to use in a regression equation was open 
to other than purely statistical considerations. 
Arithmetic and Vocabulary were selected as 
the pair of verbal subtests to use because 
they represent a variety of content materials. 
In particular, the inclusion of the arithmetic 
content provides a desirable range of situa- 
tions for subjective observations in the testing 
session without loss of statistical accuracy. 
The Block Design and Picture Arrangement 
pair is most highly correlated with Perform- 


Table 2 


Correlations Between Two Performance Tests and Total Performance Score 




















Ages 18-19 Ages 25-34 Ages 45-54 

Tests r Tests r Tests r 
Bl. Des., P. Arr. .939 Bl. Des., P. Arr. 917 Bl. Des., P. Arr. .926 
P. Arr., Obj. Assem. .920 Dig. Sym., Bl. Des. .917 P. Arr., Obj. Assem. .926 
P. Comp., Bl. Des. .920 P. Comp., Bl. Des. .914 P. Comp., Obj. Assem. .920 
BI. Des., Obj. Assem. .920 P. Comp., Obj. Assem. .909 P. Comp., Bl. Des. .919 
Dig. Sym., Bl. Des. 918 P. Arr., Obj. Assem. .906 Dig. Sym., Obj. Assem. .919 
P. Comp., Obj. Assem. .916 P. Comp., P. Arr. .906 Dig. Sym., Bl. Des. 915 
P. Comp., P. Arr. 913 Bl. Des., Obj. Assem. .902 Dig. Sym., P. Comp. 914 
Dig. Sym., P. Comp. 912 Dig. Sym., P. Comp. .900 P. Comp., P. Arr. .905 
Dig. Sym., Obj. Assem. .906 Dig. Sym., Obj. Assem. .897 Dig. Sym., P. Arr. 887 
Dig. Sym., P. Arr. 887 Dig. Sym., P. Arr. 880 Bl. Des., Obj. Assem. 886 
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ance Score in all three age groups. This pair 
was selected to represent the performance 
tests. 

The sum of the scaled scores on the four 
selected subtests—Arithmetic, Vocabulary, 
Block Design, and Picture Arrangement— 
was correlated with the Full Scale Score for 
each of the three age groups shown in Tables 
1 and 2. The computation was done by the 
following formula with all the terms available 
from the tables of intercorrelations: 


faTFa + ToTO% + FeTTe + FaToa 

T (a+d+e+4) é, 
The subscripts a, 5, c, and d identify the four 
subtests. T identifies the Full Scale Score. 
The denominator was computed as follows: 





T(a+b+e44)T = 





d ada 
T(e+b404d) ™ J a? +2 2 Fijgoio; (i # j). 
“a “sa ima 
In the process of obtaining norms on the 
WAIS for older people, four “old-age” groups 
had been obtained and the intercorrelations 
among the WAIS tests had been computed 
(1). For each of these four groups the cor- 
relation between the selected four tests and 
the Full Scale Score was determined. The re- 
sulting coefficients and the regression equa- 
tions for predicting Full Scale Score from the 
sum of four tests are shown in Table 3 for 
the three age groups in the national stand- 
ardization and for the four old-age groups. 
It may be noted from Table 3 that the 
slopes of the regression lines are very similar 
in the seven age groups. Indeed, it seemed 
practical to use 2.5 as the coefficient of the 


Table 3 


Correlation of Sum of Four Subtests with Full Scale 
Score and Regression Equation for 
Predicting Full Scale Score 








Regression equation 





Age (x,;= Full Scale Score; 
group N r x2=sum of 4 Tests) 
18-19 200 .960 x= 2.5x2 + 9.7 
25-34 300 .954 x, = 24x, + 13.0 


45-54 300 .958 
60-64 101 .968 
65-69 86 963 
70-74 80 957 
75andover 85 .962 


z= 2.5x + 7.5 
x, = 2.3x2 + 12.2 
x, = 2.3x_, + 10.7 
x, = 2.4%2.+ 7.9 
x, = 2.6%2+ 0.2 








Table 4 


Estimation of Full Scale Score from Sum of Scores on 
Arithmetic, Vocabulary, Block Design, 
and Picture Arrangement 

















Age Full Scale Score= 
group 2.5 S*+ 
16-17 10 
18-19 10 
20-24 10 
25-34 10 
35-44 9 
45-54 8 
55-64 7 
65-69 5 
70-74 5 
75 and over 4 
*S = sum of scaled scores on 4 tests (use Table 17 of WAIS 
Manual). 


predicting variable in each of the equations 
and to adjust the constant term according to 
age of the individual. In this way a regression 
equation could be determined for each age 
group, including those for which tables of in- 


‘tercorrelations of the tests had not been pre- 


pared. This table, with the constants given 
as whole numbers, is presented as Table 4. 
Thus, an examiner who obtains the scaled 
scores on Arithmetic, Vocabulary, Block De- 
sign, and Picture Arrangement may estimate 
the Full Scale Score by multiplying the sum 
of the four scores by 2.5 and adding a con- 
stant in accordance with the subject’s age. 
An estimate of the Full Scale IQ may then 
be obtained by entering the IQ tables given 
in the WAIS Manual. 

In making any prediction one should al- 
ways have some idea of the error involved. 
In this study the standard deviation of Full 
Scale Scores is about 25 and the correlation 
coefficient between the sum of the four se- 
lected tests and Full Scale Score is approxi- 
mately .96. Consequently the standard error 
of estimate in predicting Full Scale Score 
from the four tests is about 7 scaled score 
points. (This is equivalent to 4.2 IQ points.) 
Thus an estimated Full Scale Score would be 
within 7 scaled score points of the actual 
score about 68 out of 100 times. 

As a check on the equations given in Table 
4, distributions of the actual differences be- 
tween the obtained and estimated Full Scale 
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Table 5 


Distribution of Differences Between Full Scale 
Score and Estimated Full Scale Score 











Ages 20-24 Ages 35-44 
Difference* f % f % 
Over 21 1 0.5 
15-21 3 13 2 0.7 
8-14 25 12.5 39 =: 113.0 
0-7 70 3635.0 115 38.3 
—7-—1 72 36.0 98 32.7 
—14--8 24 12.0 a6 tae 
More than —14 5 2.5 9 3.0 
N 200 300 
Mean —0.1 —0.5 





* Full Scale Score minus the estimated Full Scale Score 
obtained from Table 4. 


Scores were prepared for age groups 20-24 
and 35-44. These two groups had not been 
used in the correlation analysis which led to 
the regression equations although they were 
used in the determination of age-group con- 
stants. 

The distribution of differences, grouped in 
terms of the suggested standard error of esti- 
mate of 7 points, is shown in Table 5. It may 
be noted that 71 per cent of the differences 
are contained within one standard error (+ 7 
points) and 96 per cent are within two stand- 
ard errors (+ 14 points) in each age group. 


Summary 


An abbreviated form of the Wechsler Adult 
Intelligence Scale was determined by select- 


ing the two best predictors of the total Verbal 
Score and the two best predictors of the total 
Performance Score. The four tests were Arith- 
metic, Vocabulary, Block Design, and Picture 
Arrangement. The selection of tests was based 
on the data obtained in the national stand- 
ardization of the WAIS. 

The correlation coefficients between the 
sum of scaled scores on the four tests and the 
Full Scale Score varied between .95 and .96 
for the seven age groups studied. A simplified 
regression equation for predicting Full Scale 
Score was presented in which the sum of 
scaled scores on the four selected tests is 
multiplied by 2.5 and a constant is added 
depending upon the subject’s age. 

The selection of subtests made in this study 
permits an estimate of Full Scale Score after 
35—40 minutes of testing. The standard error 
of estimate of Full Scale Scores thus obtained 
is about 7 standard score points. 


Received November 15, 1955. 
Early Publication. 
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The Wechsler-Bellevue and Psychiatric Diagnosis: 
A Factor Analytic Approach 


George H. Frank * 


Florida State University 2 


Although considerable research has been 
conducted to determine the efficacy of the 
Wechsler-Bellevue Intelligence Test (5) in 
predicting psychiatric diagnoses (2, 3), the 
results have been disappointing. By and 
large, the hypothesis that the W-B can elicit 
more than intellectual factors has not been 
empirically validated. 

The usual method of testing this hypothesis 
is to select two distinct samples, either two 
clinical groups, or a clinica! and a so-called 
“normal” population, and compare the group 
subtest performance by ¢ or r. Through the 
use of such techniques the researcher is af- 
forded only a gross measure of disparity or 
similarity without any further information as 
to the reasons for the results. 

An interesting innovation occurred to the 
writer which might provide a measure of com- 
parability but, further, offer possible rationale 
for the similarity. It was reasoned thus: if 
one were to take the subtest scores of indi- 
viduals from several diagnostic groups and 
throw them into one correlation matrix and 
factor-analyze the matrix, the subjects with 
similar subtest patterns would be grouped, 
and the reasons for their groupings could be 
ascertained from other identifying data. 


Method 


In a previous research (1), analyses of 
variance of the subtest scores of subjects sub- 


1 Based in part on a paper read at the Southeast- 
ern Psychological Association, in Atlanta, on May 
23, 1955. 

2 Now at Topeka State Hospital, but the writer 
would like to acknowledge the assistance of Prof. 
Bob McGinnis, now of the University of Wisconsin, 
then Director of the Sociological Research Labora- 
tory of Florida State University. 
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sumed under thirty different diagnostic cate- 
gories were performed. This analysis demon- 
strated that only nine of these constituted 
homogenous groupings of subtest scores, while 
the 21 others proved to be statistically un- 
related. 

A further check on the homogeneity of 
these nine groupings of W-B subtest scores 
was made by computing W’s (a method of 
multiple-rank correlation) for each of the 
nine groups. The W’s were high (above .90) 
and significant. Then the subtest scores of 
the sixty subjects from these nine different 
psychiatric groups were intercorrelated, one 
with the other, by the Spearman method of 
rank correlation, and the matrix analyzed 
by Thurstone’s centroid method of factor 
analysis. 

Since the data originally came from Rapa- 
port’s work (4), it was a simple task to turn 
to the data for the identifying characteristics 
of the factor loadings. Thus those high in a 
given factor were compared on the basis of 
their diagnosis, age, sex, education, verbal, 
performance, and Full Scale IQ’s to deter- 
mine the communality. ; 


Results 


Because of the cumbersomeness of a 60 X 
60 correlation table, the original matrix is not 
presented here. However, Table 1 presents 
the factor loadings of the sixty subjects on 
the two factors isolated by the analysis of the 
correlation matrix. 

The question now arises as to the meaning 
of the factor loadings. The immediate con- 
clusion proffered by the data is that psychi- 
atric diagnosis is not the common element of 
either one of the factor loadings. In fact the 
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Table 1 
Table of Loadings on the Two Factors Isolated by the Analysis 














Factor Factor Factor Factor 
Diagnosis Subject I I Diagnosis Subject I II 
Neurasthenia 1 721 106 Anxiety and depression 30 807 All 
Neurasthenia 2 .703 .281 Anxiety and depression 31 829 393 
Neurasthenia 3 611 080 Anxiety and depression 32 .655 573 
Neurasthenia 4 .080 125 Anxiety and depression 33 .745 141 
Neurasthenia 5 693 .180 Anxiety and depression 34 A71 579 
Neurasthenia 6 436 .265 Anxiety and depression 35 —.046 .234 
Deteriorated unclassified Deteriorated paranoid 
schizophrenia 7 812 .261 schizophrenia 36 520 211 
Deteriorated unclassified Deteriorated paranoid 
schizophrenia 8 303 .167 schizophrenia 37 581 071 
Deteriorated unclassified Deteriorated paranoid 
schizophrenia 9 147 176 | schizophrenia 38 A91 .040 
Deteriorated unclassified | Deteriorated paranoid 
schizophrenia 10 .730 .285 | schizophrenia 39 630 131 
Deteriorated unclassified | Deteriorated paranoid 
schizophrenia 11 134 .074 schizophrenia 40 —.246 .288 
Deteriorated unclassified | Mixed neurosis 41 536 183 
schizophrenia 12 382 561 | Mixed neurosis A. 707 105 
Deteriorated unclassified | Mixed neurosis 43 .733 .164 
schizophrenia 13 A453 114 | Mixed neurosis 44 —.094 114 
Involutional depression 14 557 487 | Mixed neurosis 45 835 149 
Involutional depression 15 .653 062 | Mixed neurosis 46 542 467 
Involutional depression 16 .733 308 Mixed neurosis 47 —.220 752 
Involutional depression 17 494 095 | Mixed neurosis 48 352 569 
Involutional depression 18 661 211 | Mixed neurosis 49 .257 115 
Involutional depression 19 —.046 531 | Acute paranoid schizophrenia 50 .248 .224 
Involutional depression 20 .786 .065 | Acute paranoid schizophrenia 51 .670 .226 
Maladjusted patrol 21 —.336 163 | Acute paranoid schizophrenia 52 A5S7 343 
Maladjusted patrol 22 032 040 | Acute paranoid schizophrenia 53 —.039 387 
Maladjusted patrol 23 .106 085 | Acute paranoid schizophrenia 54 .708 309 
Maladjusted patrol 24 323 .249 Acute paranoid schizophrenia 55 654 112 
Maladjusted patrol 25 .226 302 | Acute paranoid schizophrenia 56 .229 .278 
Anxiety and depression 26 567 135 | Acute paranoid schizophrenia 57 652 .258 
Anxiety and depression 27 .636 321 Acute paranoid schizophrenia 58 572 151 
Anxiety and depression 28 .829 350 Acute paranoid schizophrenia 59 606 .233 
Anxiety and depression 29 319 .109 Acute paranoid schizophrenia 60 320 .198 





communality of the first factor loading ap- 
peared to be Verbal IQ, and that of the sec- 
ond: Performance IQ. 


Discussion 


The results seem to suggest that the hy- 
potheses underlying the use of the Wechsler- 
Bellevue in a psychiatric setting must be 
seriously questioned. The test proves to sort 
the subjects not in terms of psychiatric char- 
acteristics, either a general factor of emo- 
tional maladjustment, a factor of psychosis 
or neurosis, or individual diagnostic category, 
but in terms of intellectual factors only. In 


all likelihood, it would appear that the Wechs- 
ler-Bellevue should be restricted to such use, 
i.e., as a test of intelligence. 


Summary 


In an attempt to test the efficacy of the use 
of the Wechsler-Bellevue in a psychiatric set- 
ting, the subtest performance of sixty subjects 
from nine different psychiatric groups was in- 
tercorrelated and factor-analyzed. The result- 
ant analysis yielded two factors, neither of 
which isolated the subjects in terms of psy- 
chiatric variables, i.e., a general factor of mal- 
adjustment, a factor of neurosis or psychosis, 
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or diagnostic category, but in terms of IQ 
scores on the verbal and performance parts 
of the test. In light of these results, it seems 
apparent that the Wechsler-Bellevue does not 
yield significant data as regards psychiatric 
diagnosis, and continues to sort subjects in 
terms of intellectual factors only. The use of 
the Wechsler-Bellevue in a psychiatric setting 
is once more questioned, and its restrictive 
use as a test of intelligence re-emphasized. 


Received June 6, 1955. 
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Anxiety Level and Pursuitmeter Performance’ 


Ruth G. Matarazzo and Joseph D. Matarazzo 
Massachusetts General Hospital and Harvard Medical School 


The Ss were 80 white, male, VA inpatients 
ranging in age from 18 to 37. The measure of 
anxiety employed was the Taylor Manifest 
Anxiety Scale. Five anxiety groups were used, 
Taylor raw scores of 1 to 8 being given an 
anxiety rating of 1, 9 to 16 a rating of 2, etc. 
Group 5 contained individuals with Taylor 
scores from 33 to 44. The number of Ss within 
each group from 1 to 5 were 7, 17, 16, 21, 
and 19, respectively. The five anxiety groups 
were equated for age, education, and total 
Wechsler-Bellevue IQ. Also there were no 
significant group differences in performance 
on any of the eleven W-B subtests, as meas- 
ured by F tests and epsilon correlations. The 
finding of no significant differences in W-B 
scores is important since both Grice and Ker- 
rick and our own group have reported mod- 
erate but significant negative correlations 
(ranging from — .20 to — .40) between the 
Taylor Scale and some measures of “intelli- 
gence.” For the present data, the value of 
the correlation (epsilon) was not significant. 

The learning task was a complex, double- 
disk pursuitmeter loaned by D. Lewis. For 
each S there were 20 trials of 20 seconds each 
followed by 40 seconds of rest. Two measures 


1An extended report of this study may be ob- 
tained without charge from Ruth Matarazzo, Mas- 
sachusetts General Hospital, Boston 14, Massachu- 
setts or for a fee from the American Documentation 
Institute. Order Document No. 4748 from ADI Aux- 
iliary Publications Project, Photoduplication Service, 
Library of Congress, Washington 25, D. C., remit- 
ting in advance $1.75 for microfilm or $2.50 for pho- 
tocopies. Make checks payable to Chief, Photodupli- 
cation Service, Library of Congress. 


of learning were used: gain (the difference 
between each S’s average score on the last 
two trials and his average score on trials 1 
and 2), and total time on target. 

Results indicate no statistical relationship 
between either of these two learning meas- 
ures and Taylor anxiety level. Despite this 
lack of significance as shown by F tests -(.29 
and .00) and correlation measures (epsilon 
values of .04 and .22), there was a trend for 
the middle anxiety groups (2 and 3) to be 
superior learners. This latter finding, while of 
no interest in itself, takes on some slight sig- 
nificance in view of the finding of a similar 
curvilinear relationship in two previous stud- 
ies from our laboratory, and two unpublished 
studies of Spence and Taylor from the Iowa 
laboratory. 

The present negative results are of interest 
in view of the findings of both Grice and 
Kerrick, and our own previous work on the 
relationship between “intelligence” and score 
on the Taylor scale. Furthermore the present 
findings do not support our earlier proposed 
hypothesis that timed tests might be more 
sensitive to variations in Taylor anxiety level 
than are other tests. Nor do the present find- 
ings support our earlier belief that the trend 
toward a curvilinear relationship, found with 
a digit-symbol learning task, was an example 
of the type of case described by Webb and 
Lemmon, in which the interpretation of a non- 
significant F test should be qualified. 


Brief Report. 
Received October 28, 1955. 
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A Validation Study of the Bender-Gestalt’ 


Benjamin Mehlman and Edward Vatovec 
Kent State University 


In recent years, increasing attention has 
been given to the application of test pro- 
cedures to the problems of organic brain pa- 
thology. If, as implicitly and explicitly indi- 
cated (3, 7, 9, 11), there are dimensions of 
behavior that are peculiar to the organic, 
then tests may be evolved which can distin- 
guish the organic. The purpose of this study 
is to determine whether the Bender Gestalt 
Test reliably differentiates organic from func- 
tional psychotic institutionalized individuals. 

Before the introduction of the Bender Ge- 
stalt Test, perception and reproduction of de- 
sign by psychotic patients had received only 
scant attention by other authors (11). In 
1938, Bender (3, 4) assembled nine Gestalt 
figures derived from Wertheimer’s configura- 
tions into what is now known as the Bender 
Visual Motor Gestalt Test. This test was built 
on the premise that the reproduction of test 
design is a visual-motor act in which distor- 
tions of original pattern indicate malfunction- 
ing due to neural injury, variations in intel- 
lectual levels, or maladjustment in the emo- 
tional field of the perceiving subject. Schilder 
(14), in the preface to Bender’s monograph 
of the test (3), states that “Personal experi- 
ence has taught me that the clinical value of 
the test is very great. It may allow a differ- 
ential diagnosis between organic deterioration, 
so called functional mental disease and ma- 
lingering.” 

Several studies with the Bender Gestalt 
have failed to differentiate functionally ill 
from nonfunctionally ill, and between groups 
of functionally ill (1, 5, 6, 13). However, 
Irion and Pascal (15, p. 148) and Hobbs 
(10) report reliable differentiations between 


1 Based on a master’s thesis submitted by the 
junior author. 
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psychiatric patients and nonpatients. Prelimi- 
nary observations by Barkley (2) with plastic 
relief reproductions suggest differentiation be- 
tween organic and nonorganic. 

Halpern (8), writing about the Bender Ge- 
stalt in a text on projective techniques, re- 
ports that where the test has been given be- 
fore and after brain injury or other traumatic 
experiences, the disturbance in personality 
found expression through the resulting distor- 
tions of the gestalten. Woltmann (16) sum- 
marizes the work reported by Bender on or- 
ganic brain diseases in her monograph and 
says it proves beyond doubt that organic 
brain disease does interfere with the visual- 
motor gestalt functions. 

These studies, with their variant conclu- 
sions as to the worth of the Bender Gestalt 
Test, have led to the current attempt to clarify 
the worth of this test. This study seeks to de- 
termine whether the Bender Gestalt reliably 
differentiates organic from functional psy- 
chotic institutionalized individuals. 


Method 


Psychotic patients from Massillon State 
Hospital, Massillon, Ohio, were used for this 
study. The name, sex, age, diagnosis, and the 
time spent in mental hospitals were recorded 
individually for each of the 3,216 patients at 
the hospital on separate file cards. These cards 
were then separated into four groups: one 
which was made up from male patients who 
carried an official hospital diagnosis of func- 
tional psychoses; another of female functional 
patients; one of male organic patients and 
the last of female organic patients. Eliminated 
from these groups were patients who carried 
a diagnosis of mental deficiency. The groups 
were then arranged in order of chronological 
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age. It was possible individually to match 118 
functionally psychotic male patients with 118 
organically psychotic male patients of the 
same age and of not more than one year’s 
difference in regard to time spent in the hos- 
pital. This same procedure was followed in 
choosing the female population of the study. 
There were 102 paired female patients fol- 
lowing this procedure. 

Eliminated from this study were patients 
who had undergone any form of psychosur- 
gery. Also eliminated were any patients who 
had undergone any form of convulsive treat- 
ment within the previous six months. Kali- 
nowsky and Hock (12) summarized studies 
made in this field and came to the conclusion 
that although there is no actual impairment 
of cerebral functioning following a course of 
convulsive therapy, there is some question as 
to how long after treatments have been dis- 
continued these functionings return to nor- 
mal. The six-month period was here consid- 
ered a long enough time for the effect of the 
treatment to disappear. Also eliminated from 
this study were patients who refused to take 
the test, were unable to comprehend the di- 
rections, had markedly poor eyesight or other 
obvious or known physical handicaps which 
interfered with ability to take the test. 

The elimination of patiegts for the above 
reasons reduced the 118 paired males to 34 
matched male patients, and the 102 paired 
female patients to 21 matched female pa- 
tients. After the protocols had been collected, 
two sets of 25 pairs of protocols each were 
selected in unbiased fashion from among the 
pairs of protocols available. 

The design of this study called for sta- 
tistically better than chance differentiation by 
experts between organically and functionally 
ill from the set of 25 matched pairs of proto- 
cols. To give this task to individuals only 
vaguely or incompletely familiar with the 
Bender Gestalt Test as a clinical instrument 
would have loaded the data against finding 
positive results. Sending protocols to experts, 
defined as individuals who had published ac- 
counts of the clinical research use of the test, 
and who were enthusiastic about the test, in- 
creased the likelihood of securing positive re- 
sults. There were eight such individuals lo- 
cated by survey of the literature. One refused 


to participate and several felt unable to par- 
ticipate, for reason of time or the like. Four 
accepted invitations to participate, but only 
three returned the materials in completed 
form. Each authority received 25 paired pro- 
tocols, each protocol containing only the code 
letter “A” or “B,” with instructions to place 
either letter under the organic or functional 
column on the answer sheet. 


Results 


In terms of our earlier expectancies, that 
there would be approximately eight judges 
participating, two sets of 25 pairs of matched 
protocols had been arranged. In terms of the 
completed data, two judges used one set of 25 
and the other judge used the remaining set. 
Since the placing of a pair of protocols into 
one or another of the sets was a chance af- 
fair, there is no reason to believe that the 
sets are not equivalent in any respect. Set 1 
had 15 pairs of males, 10 pairs of females, 
and Set 2 had 16 pairs of males, 9 pairs of 
females. 

On the basis of chance alone we could ex- 
pect each judge to get 12.5 choices correct. 
Judge A secured 11 successes out of the 25 
choices in Set 1. Since this represents fewer 
successes than the Mean, and since we are 
here concerned only with a one-tailed test, 
this represents a failure on the part of the 
Bender Gestalt, as utilized by this expert, to 
discriminate successfully between the organi- 
cally and functionally ill. 

Judge B secured 17 correct choices of the 
possible 25 in Set 2. Using a one-tailed test, 
this represents successful choice at the .035 
level of confidence. In other words, there are 
only 3.5 chances in 100 that Judge B could 
have been right this number of times on the 
basis of chance alone. 

Judge C secured 18 correct choices of the 
possible 25 in Set 1. Using a one-tailed test, 
this represents successful choice at the .014 
level of confidence. In other words, there are 
only 1.4 chances in 100 that Judge C could 
have been right this number of times on the 
basis of chance alone. 

In summary then, we have found that two 
of our judges attained or closely approached 
better than chance expectancy success in dis- 
criminating the organics from the function- 
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ally psychotic through the Bender Gestalt, 
and the third judge failed to have more than 
chance success. 


Discussion 


Although the Bender Gestalt had been in 
existence since 1938, it was only during World 
War II, when there was a felt need for a 
reliable test to determine organic brain pa- 
thology, that the Bender Visual Motor Ge- 
stalt Test became the subject for investiga- 
tion. In looking over the studies on the Bender 
Gestalt Test we can find no rigorous study of 
the Bender Gestalt’s ability to differentiate 
between functional psychotic patients and or- 
ganically psychotic patients. 

The current study suggests that while this 
instrument is not used equally well by differ- 
ent experts, at least with respect to the highly 
focused task of this study, in the hands of 
some it can reliably distinguish organic from 
functional psychotic. While variability in skill 
in utilizing such an instrument is no new find- 
ing, such heterogeneity as found here among 
authorities so highly select is surprising. This 
is particularly so when we consider that the 
differentiations were to be made from among 
patients not selected as differential diagnostic 
problems; i.e., these protocols were taken 
from patients who, to the best of our knowl- 
edge, are typical of patients in general from 
their respective groups. 

Very much to the point too is the fact that 
even the most successful of our experts was 
wrong in 7 of 25 choices. The statistical sig- 
nificance should not lead to complacency 
about the Bender Gestalt. This is particularly 
true when we recall that the task set for our 
experts here was far easier than the task the 
clinician meets in his daily routine; i.e., here 
the judge, having diagnosed one protocol, had 
also diagnosed the other. In addition, our 
judges had a much more gross kind of label- 
ing to do than is generally encountered in 
clinical practice. That there is some basis for 
the enthusiasm expressed in the literature has 
been demonstrated here. But the need for 
greater refinement in the use of the Bender 
Gestalt is equally apparent. 


Summary and Conclusions 


This study was undertaken to determine 
whether the Bender Visual Motor Gestalt 
Test reliably differentiates organic from func- 
tional institutionalized individuals. 

Twenty-five paired protocols were sub- 
mitted to three authorities of the Bender Ge- 
stalt Test for blind analysis to determine 
whether the experts were able to pick out the 
organic and functional psychotic protocol from 
the matched pairs. Two judges attained or 
closely approached better than chance success 
and another had no more than chance success. 
These results suggest a surprising heteroge- 
neity of skill in the use of Bender Gestalt, 
even among experts. These results suggest 
many mistaken diagnoses, even by the best 
judge. Further refinement in the use of the 
Bender Gestalt is indicated, but the utiliza- 
tion of the Bender Gestalt alone in clinical 
practice is not warranted. 


Received June 14, 1955. 
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Although the ranking method has been used 
to quantify clinical judgments (4, 10), little 
is known of the parameters affecting the re- 
liability and validity of judgments measured 
by this method. Characteristics of the stimuli, 
scale, and judges have been shown to influ- 
ence the reliability of rating scale judgments 
(2, 3, 6), but similar research on the ranking 
method is nonexistent. 

The present study was concerned with the 
problem of using the ranking method to estab- 
lish a criterion ordering of clinical material 
based upon the judgments of experts. The cen- 
tral problem in establishing such a judgmental 
criterion is lack of agreement among the ex- 
perts. A relatively low average intercorrelation 
among judges does not necessarily imply in- 
dividual judge unreliability, but that the ex- 
perts may be using different bases in making 
judgments of clinical material. If two clinical 
psychologists rate or rank clinical material 
using similar bases of judgment we would ex- 
pect them to show high intercorrelation, but 
if they look at different aspects of the mate- 
rial their intercorrelation might be expected 
to be disappointingly low. As an example, 
Bendig (1) found that for esthetic judgments 
interjudge reliability (average intercorrelation 
among judges) is not a simple rectilinear 
function of intrajudge reliability (retest cor- 
relation for single judges). 

A statistical method that can be used to test 
whether clinicians are using different bases of 
judgment and also to purify a judgmental cri- 
terion is inverse factor analysis. Factor-ana- 
lyzing a matrix of intercorrelated judgments 
from clinicians would reveal whether a single 
general “agreement” factor can explain the 
obtained intercorrelations or whether a series 


of independent group factors is necessary. 
In the latter case the group factors would 
offer a means of categorizing the clinicians as 
to those with similar and dissimilar bases of 
judgment. Also the clinical material could be 
made more factorially homogeneous by elimi- 
nating material which clinicians using dissimi- 
lar bases disagreed upon and retaining only 
material that they judge similarly, regardless 
of their basis of judgment. However, the 
“purified” material should be validated by 
having other judges similarly rank the “puri- 
fied” set of clinical material. 


Procedure 


Subjects. The 50 judges consisted of seven 
clinical psychologists and 43 undergraduate 
students in two sections of a course in mental 
hygiene. All the clinical psychologists had re- 
ceived the Ph.D. degree and had worked for 
a minimum of one year in clinical practice 
with children and adolescents. The under- 
graduate Ss had a minimum of three semester 
hours in psychology with many having more. 
The Ph.D. Ss were contacted individually and 
asked to read and rank for adjustment level 
15 abstracted case histories. They were in- 
structed to use their own clinical definition of 
adjustment, but it was suggested that they 
use as their reference point the behavior of 
the average child in a normal environment. 
The undergraduate Ss were contacted in class 
and a booklet containing either 10 or 15 of 
the same case histories was given to each. A 
sheet of instructions similar to those given 
the Ph.D. Ss was attached to the booklet. 
The undergraduate Ss read and ranked the 
cases outside of the class and returned the 
booklet at a subsequent class period. 
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Stimuli. The 15 case histories were ab- 
stracted from longer and more complete cases 
published in a number of texts on clinical 
child psychology and psychiatry. A standard 
outline was used with each case and the in- 
formation was abstracted under six areas that 
previous interviews with clinical child psy- 
chologists had indicated were most useful and 
necessary. Each case abstract was approxi- 
mately one page of single-spaced typing in 
length. Of the 15 cases selected, 10 were boys 
and 5 were girls. The range in age of the 
cases was from 7 to 16 years old with the 
median at 12 years. 

Analysis. The rankings of the seven clinical 
psychologist judges were intercorrelated by 
the rank-difference method and the resulting 
matrix of correlations subjected to a centroid 
factor analysis. The obtained orthogonal fac- 
tors were analytically rotated to approximate 
simple structure. Each judge’s ranks of the 
15 cases were weighted by his factor loadings 
on the rotated orthogonal factors and a mean 
weighted factor score obtained for each of the 
cases on each factor. The five cases whose fac- 
tor scores showed the greatest intracase dis- 
crepancy were then selected for elimination. 
Booklets containing the original 15 cases or 
the 10 cases after the above elimination 
process were distributed to the undergradu- 
ate student judges for their adjustment rank- 
ing judgments. The average intercorrelation 
among the student Ss in the two judgmental 
groups (those ranking 10 cases and those 
ranking 15) and their average correlation 
with the ranks of the same cases assigned by 
the clinical psychologists were computed by 
the formulas given by Lyerly (9). The differ- 
ences between these average correlations from 
the two groups of student Ss were tested for 
significance. 


Results 


The average intercorrelation among the 
clinical psychologists ranking 15 case his- 
tories was .51 and ranged from .04 to .83. 
The centroid factor analysis of the 21 inter- 
correlations yielded three orthogonal factors 
and the analysis was replicated three times to 


1The author’s appreciation is extended to Miss 
Phyllis Black for the abstracting of the case histories. 


Table 1 


Summary of Centroid Factor Analysis of Interjudge 
Correlations of Clinical Psychologists 








Rotated Factor Loadings 











Clinical 
Psychologist A B C i 
\ 01 87 00 76 
B 79 54 00 92 
( 73 10 02 54 
D 38 43 75 89 
E 33 90 08 93 
F 68 14 43 67 
G 39 83 —30 93 
Percentage of 
Intrajudge 
Variance 29 39 12 80 





stabilize the communality estimates. After the 
third iteration the mean absolute difference 
between the estimated and obtained commu- 
nalities was .01. The median absolute residual 
correlation after extraction of the third factor 
was .03.* The original factor loadings were 
analytically rotated to approximate simple 
structure and the rotated factor loadings are 
found in Table 1. It can be seen that there 
was no general “agreement” factor among the 
judges, but the three group factors obtained 
accounted for 80 per cent of the intrajudge 
variability in judgment. Five of the judges 
showed large loadings on factor B which ac- 
counted for 39 per cent of judgmental vari- 
ability, while six of the judges had significant 
loadings on factor A which measured another 
29 per cent of intrajudge variance. Factor C 
was bipolar and comprised the remaining 12 
per cent of the variation in judgments, with 
two judges on the high positive end of this 
dimension and one judge on the negative end. 
Judgmental communality as estimated by 
these three factors varied from 54 to 93 per 
cent for the seven clinical psychologists. 


2 A table giving the original and residual intercor- 
relation between the clinical psychologists, the origi- 
nal factor loadings, and the transformation matrix 
has been deposited with the American Documenta- 
tion Institute. Order Document No. 4744 from the 
ADI Auxiliary Publications Project, Photoduplication 
Service, Library of Congress, Washington 25, D. C., 
remitting in advance $1.25 for microfilm or $1.25 for 
photocopies. Make checks payable to Chief, Photo- 
duplication Service, Library of Congress. 





Ranking Methodology with Clinical Case Histories 


Each clinical psychologist’s rank judgments 
were then weighted by his factor loadings 
given in Table 1 and a mean weighted score 
on each of the three weighted scores was ob- 
tained for each case and the five case histories 
with the largest interfactor variability selected 
for elimination. New rank-difference correla- 
tions between the clinical psychologists for 
the remaining 10 cases were computed and it 
was evident that only one general factor could 
account for the matrix of interjudge correla- 
tions. The mean correlation between these 
judges on these 10 case histories was .69 and 
Kendall’s coefficient of concordance (8) was 
.74. The case sums of ranks for either the 10 
or the 15 case histories assigned by the clini- 
cal psychologists were used to establish cri- 
terion rank ordering of the cases. Applying 
the Spearman-Brown formula to the average 
interjudge correlations gave a group reliability 
of .88 for the 15 cases and a group reliability 
of .94 for the summed ranks of the 10 case 
histories. 

Booklets containing a face sheet of instruc- 
tions plus either the original 15 case histories 
or the above selected judgmentally homoge- 
neous 10 cases were assembled and distributed 
to 43 students in two undergraduate sections 
of a mental hygiene course. These Ss ranked 
the 10 or 15 cases for adjustment level and 
their assigned ranks were summed for each 
case. Average interjudge correlations among 
the undergraduate Ss within each of the two 
groups and their average correlation with the 
criterion rankings of the same cases were 
computed by the formulas given by Lyerly 
(9). The differences between the two under- 
graduate groups of Ss were tested for signifi- 
cance by the methods outlined by Edwards 
(5, p. 136) which Hartley (7) has shown 


~l 
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to be applicable to rank correlations. The 
summed ranks given by the undergraduate Ss 
to the cases were used to rank the cases and 
these ranks correlated with the criterion rank- 
ing of the clinical psychologists by the usual 
rank-difference method. These results can be 
found in Table 2. Both the average intercor- 
relation among the undergraduate Ss ranking 
the 10 homogeneous cases and their average 
correlation with the criterion rankings of the 
clinical psychologists were significantly larger 
at the .01 level. The average interjudge cor- 
relations were .51 and .28 for the groups rank- 
ing 10 or 15 cases (¢ = 2.68) while their av- 
erage correlations with the criterion rankings 
were .70 and .47 (¢ = 3.37). The total ranks 
of the 10-case group showed almost a perfect 
correspondence to the total ranks given by 
the clinical psychologists (rho = .97), while 
the total ranks of the 15-case group was lower 
(rho = .76). 


Discussion 


The results of the present study agree with 
other reports indicating that clinical psycholo- 
gists as a group show a moderate average in- 
tercorrelation in their judgments, but that the 
variability in agreement level among various 
pairs of experts is extremely large. For ex- 
ample, the average intercorrelation among our 
seven clinical psychologists was .51, but single 
correlation coefficients ranged from .83 to .04. 
The factor analysis of these intercorrelations 
indicated the absence of a general factor of 
agreement and the presence of several group 
factors. We suggest that a categorization of 
judges on the basis of the pattern of their 
group factor loadings will result in subgroups 
of clinicians internally homogeneous as to the 
bases they use in making clinical judgments. 


Table 2 
Reliability Coefficients of Undergraduate Student Judges Ranking 10 or 15 Case Histories 

















Number Number Average Average Group 
of case of Coeff. of interjudge Group corr. with corr. with 
histories judges concordance correlation reliability psychologists psychologists 
10 24 52 51 .96 .70 97 
15 19 32 .28 88 Al 76 
Significance of difference (¢) 2.68* 3.37* 





* Significant at the .01 level. 
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Naturally this is an hypothesis that was not 
tested by the present study, but one that is 
capable of being tested. 

The technique of developing a factorially 
homogeneous set of clinical material by factor 
analyzing expert’s judgment appears to be 
generalizable to the judgments of naive, less 
experienced judges. The student Ss ranking 
the homogeneous case histories showed a 
higher average level of agreement among 
themselves and also as a group correlated 
higher with the judgments of the clinical psy- 
chologists. Probably student Ss have just as 
many, if not more, different bases and cri- 
teria for judging clinical material as do ex- 
perienced clinical psychologists and the ex- 
perimental procedure of selecting clinical 
material that experienced judges using dis- 
similar judgmental criteria can agree upon 
also results in greater agreement among less 
experienced judges. 

The present study was designed solely to 
test whether the factor analytic procedure 
herein described would help in developing a 
homogeneous set of clinical stimuli. These 
case histories can now be used in further re- 
search on situational variables that affect the 
reliability of clinical judgments quantified by 
the ranking method. 


Summary 


The study was concerned with whether in- 
verse factor analysis procedures could be used 
to develop a homogeneous set of clinical case 
histories. Seven clinical psychologists ranked 
for adjustment level 15 abstracted clinical 
case histories of children. The matrix of cli- 
nician intercorrelations was factor analyzed 
and three group factors obtained which ac- 
counted for 80 per cent of interjudge varia- 


tion in judgment. Factor scores were deter- 
mined for each case and 10 with the smallest 
intracase variance in factor scores selected as 
a more homogeneous group. Undergraduate 
student Ss (N = 43) then ranked for ad- 
justment level either the original 15 or the 
selected 10 case histories. The student Ss 
ranking the 10 cases were shown to have a 
significantly higher average interjudge corre- 
lation and to correlate higher with the judg- 
ments of the clinical psychologists. 


Received June 20, 1955. 
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An exceedingly important problem in clini- 
cal psychology and in all interpersonal rela- 
tions is the understanding of other persons, a 
problem on which there recently has been a 
rapidly growing literature, summarized quite 
ably by Taft (6). Among the various meth- 
ods reviewed by him to measure the ability to 
judge people has been that of prediction of 
behavior or life-history data. As Taft de- 
scribes this method, the judge has some ac- 
quaintance with or is given data about the 
subject, and he is required to predict the sub- 
ject’s performance on various test items or 
his responses to attitude and personality tests, 
or to predict certain aspects of the subject’s 
life history. These are the tests which have 
been labeled “empathy” tests. 

Quite a number of these instruments have 
been developed, as Taft indicates. One of the 
pioneer measures in this field is that of Dy- 
mond (1) whose test is made up of 4 parts, 
each containing the same 6 items. In the first 
part, the individual is asked to rate himself 
on a 5-point scale on each of 6 character- 
istics. In the second part he is asked to rate 
some other individual on the same 6 traits. 
In the third, he is required to rate the other 
individual as he believes this other would 
rate himself. In the fourth, he must rate him- 
self as he thinks the other would rate him. In 
other words, according to Dymond, if two in- 
dividuals, A and B, were being tested for their 
empathy with each other, the procedure would 
be as follows: 


A. Part 1. A rates himself (A). 
2. A rates B as he (A) sees him. 
3. A rates B as he thinks B would rate 
himself. 
4. A rates himself (A) as he thinks B would 
rate him. 
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B. Part 1. B rates himself (B). 
2. B rates A as he (B) sees him. 
3. B rates A as he thinks A would rate 
himself. 
4. B rates himself (B) as he thinks A would 
rate him. 


Therefore, a measure of A’s empathic abil- 
ity is derived by calculating how closely his 
predictions of B’s rating (A3 and A4) corre- 
spond with B’s actual ratings (Bl and B2). 
Similarly, a measure of B’s empathy with A 
is obtained by calculating how closely his pre- 
dictions of A’s ratings (B3 and B4) corre- 
spond to A’s actual ratings (Al and A2). 
Since each rating is made on a 5-point scale, 
the test is scored in terms of the total num- 
ber: of points the individual is in error in his 
predictions. This is called the deviation score. 
The 6 traits used as items in all 4 parts of 
the test are self-confidence; superior-inferior; 
selfish-unselfish; friendly-unfriendly; leader- 
follower; sense of humor. 

The Dymond test may be termed a meas- 
ure of individual empathic ability, in that the 
judge is required to make predictions concern- 
ing another individual. In other words, ac- 
cording to the test author, he is required to 
transpose himself into the thinking, feeling, 
and acting of another person, to structure the 
world as he does, or to emphathize with him 
as an individual. 

Now, another approach to the ability to 
judge other people may be termed that of 
“mass empathy.” In this approach, the judge 
is required to predict the combined response 
of a group of individuals, rather than those 
of a single person. Taft (6) cites at least 10 
studies in which this method was used, but 
does not include that of Norman and Ains- 
worth (5), owing to the comparative recency 
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of the latter study. These investigators de- 
veloped a procedure for the measurement of 
mass empathy which involved administration 
of a personality inventory (the Guilford- 
Martin Inventory of Factors GAMIN) in 
two forms. The first form was given in the 
usual way with the subject answering for him- 
self. (An exception to the regular adminis- 
tration was the elimination of the “don’t 
know” alternative, so that subjects were 
forced to answer “Yes” or “No.”) Two weeks 
following the presentation of the first test, a 
second form was given to the subjects. The 
same questions were used, except that instead 
of answering for themselves, the subjects were 
required to answer the questions as they 
thought most other people of their own age 
and sex would answer them. For example, 
question 7 on the original test reads, “Do 
you express such emotions as delight, sorrow, 
anger, and the like readily?” On the second 
form this question was changed to read, “Do 
you think most people of your own age and 
sex express such emotions as delight, sorrow, 
anger, and the like readily?” Because it was 
felt there would be more uncertainty in an- 
swering questions for others, the question or 
“don’t know” alternative was omitted from 
both forms. 

The mass empathy score is derived as fol- 
lows. If 51 per cent or more of the group an- 
swered in a given direction (e.g., “Yes”) on 
a particular question on the first test (i.e., 
when taking it for themselves) and a par- 
ticular subject answered in the same direction 
on the second test (i.e., when he was required 
to judge about most other people), he was 
given a point for empathy. In other words, 
he was “empathic” because he had made an 
accurate statement about what the majority 
(51%) had said about themselves. A sub- 
ject’s mass empathy score is simply the total 
number of points so earned. 

However, in both these methods of measur- 
ing empathy, there is the possibility of some 
error. Hastorf and Bender (3) have pointed 
out that in attempting to predict the verbal 
responses of another person on a rating scale 
or personality test, there is the spurious effect 
of projection. They ask whether or not the 


1 The use of the term “projection” by Hastorf and 
Bender (3) differs from that of Norman and Ains- 


prediction being made is closer to the re- 
sponses of the person predicted for (em- 
pathy), or to the predictor’s own score (pro- 
jection). They therefore attempted to isolate 
the factor of projection from that of empathy 
in a study of their own, a process accom- 
plished by taking the difference between the 
empathy and the projection scores. From 
their results, they emphasize the fact that 
part of the successful prediction of another 
person’s responses may be due to projection 
rather than empathy, and that a refined meas- 
ure of empathic ability might approximate 
more adequately the psychological aspects of 
empathy. In the present study, an attempt 
was made to account for this factor of pro- 
jection, as will be indicated later on. 

The basic problem investigated in the pres- 
ent study was to explore the relationships be- 
tween the Dymond test of individual em- 
pathy and the Norman-Ainsworth approach 
to measurement of mass empathy, using both 
raw and refined measures. 


Subjects and Procedure 


Subjects used for this study were 47 mem- 
bers of a social fraternity at The University 
of New Mexico. These men were divided into 
5 groups. Three groups, consisting of 11, 11, 
and 10 individuals, respectively, consisted of 
active members of 1 year membership or 
more; and 2 groups, of 9 and 6 individuals 
each, were composed of newly initiated mem- 
bers. Selection in all groups was made by 
random choice. Each member was instructed 
to rate every other member of the group to 
which he belonged on all scales of the Dy- 
mond test. Since groups of different sizes 
were used, an average empathy score was ob- 
tained for each subject. 

For the mass empathy testing, the same 47 
subjects were given the GAMIN twice accord- 
ing to the procedure described above. In or- 
der that greater confidence might be placed 
in the “51% criterion” obtained from the use 
of the GAMIN, the data of Norman and Ains- 


worth (5) in their study. The former refer to it as a 
simple attribution of one’s own traits to others. The 
latter defined it operationally in its more classical 
psychoanalytical sense, namely, denial of undesirable 
traits in oneself and attribution of them to others, 
or defense projection. In this study, we adhered to 
Bender and Hastorf’s usage. 
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worth (5) in their previous study were in- 
corporated with the data of the present study. 
Since their N was 74, and ours was 47, the 
increased N of 121 on which the percentages 
were based was obviously more satisfactory. 
It was found that this procedure was justified, 
since out of 186 items on the GAMIN, there 
were 174 or 93.5 per cent which yielded no 
significant differences between the two studies. 

As mentioned above, the factor of simple 
projection probably enters into the opera- 
tional measures of empathy used in this study 
according to the reasoning of Hastorf and 
Bender (3). Following their suggestion, it was 
decided to attempt to isolate projection on 
both tests. Thus we secured 3 measures—raw 
empathy, projection, and refined empathy. In 
obtaining the refined measure of empathy for 
the Dymond test, the following method was 
used: For each individual rated, the difference 
was taken between the score obtained on 
Part 3 of the test (wherein A rated B as he 
thought B would rate himself) and the score 
on Part 2 (wherein A rated B as he, A, saw 
B). Similarly, the score obtained on Part 4 
(wherein A rated himself as he thought B 
would rate him) was subtracted from the 
score obtained on Part 1 (wherein A rated 
himself). The total projection score for all 
subjects was subtracted from the total raw 
empathy score, and this difference in score, 
divided of course by the number of indi- 
viduals rated, gave an average refined indi- 
vidual empathy score for each subject. The 
refined mass empathy score was obtained in 
a similar fashion, i.e., by subtracting one point 
from the raw empathy score every time an in- 
dividual answered an item about “most other 
people” the same way he answered the item 
about himself. 


Results 


The different variables explored in the pres- 
ent study were intercorrelated, with results as 
given in Table 1. The latter part of Table 1, 
containing the correlations between the two 
approaches to the measurement of empathy, 
indicates conclusively that there is no rela- 
tionship between them. The coefficients are all 
in the vicinity of zero and do not differ sig- 
nificantly from that figure. However, the Dy- 
mond Individual Empathy Test reveals a high 
relationship between raw and refined em- 


Table 1 


Correlations Between Various Measures of 
Empathy and Projection 








r 





Dymond Individual Empathy Test 
Raw and Refined Empathy 81** 
Raw Empathy and Projection 13 
— A7** 


Refined Empathy and Projection 

Mass Empathy Test 
Raw and Refined Empathy —.21 
Raw Empathy and Projection 86** 
Refined Empathy and Projection — .69** 


Individual vs. Mass Empathy Tests 
Raw and Raw Empathy 07 
Refined and Refined Empathy — .06 
Projection and Projection —.10 





** Significant at 1% level. 


pathy, with the factor of projection entering 
somewhat into the raw empathy score. It is 
probably advisable, though, that the projec- 
tion variable be eliminated with future use of 
this test, since the correlation is significantly 
negative (— .47) between the refined empathy 
score and the projection score. With the Mass 
Empathy Test, on the other hand, the projec- 
tion variable is a serious one with which to 
contend, since the relationship between it and 
raw empathy is very high (r of .86); the cor- 
relation between it and refined empathy is 
also high, the r being — .69. The resulting re- 
lationship between raw and refined empathy 
tends to be a negative .21, although this is not 
statistically significant. 


Discussion 


The findings of the present study regarding 
the lack of relationship between the measures 
of individual and mass empathy are in agree- 
ment with results of Hall and Bell (2) who 
inquired into the agreement between the Dy- 
mond test and the Kerr Empathy Test (4). 
The latter is also a purported measure of mass 
empathy, requiring the subject to (a) rank 
14 common types of music in the order of 
their popularity among office workers; (0) 
rank 15 magazines in order from most to 
least paid circulation; and (c) take the role 
of an individual 40 years of age and rank 10 
experiences from most to least annoying. Hall 
and Bell feel that their test aims to measure 
the individual’s ability to assume the hypo- 
thetical average person’s structural field, i.e., 
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somehow to combine a series of “others’ ” in- 


dividual fields into an average. (The mass 
empathy approach used here seems to do 
very much the same when the subject is asked 
to make a judgment about “most other peo- 
ple.”) Hall and Bell found an average corre- 
lation of .02 between the Dymond and Kerr 
tests. This compares with the very low r’s 
found in the present study between the two 
measures of the empathic process. 

Regarding the reason for the low correla- 
tion between the mass approach and the in- 
dividual approach; Taft (6) has made the 
distinction between what he terms (after 
Wallin) analytic and nonanalytic judgments. 
He says, “In analytic judgments, the judge 
(J) is required to conceptualize, and often to 
quantify, specific characteristics of the sub- 
ject (S) in terms of a given frame of refer- 
ence. This mainly involves the process of 
inference, typical performances of J being 
traits, writing personality descriptions, and 
predicting the percentage of a group making 
a given response. In nonanalytic judgments, 
J responds in a global fashion, as in match- 
ing persons with personality descriptions and 
in making predictions of behavior. An em- 
pathetic process is usually involved in non- 
analytic judgments” (6, p. 1). Later, he says, 
“Tt is suggested that the ‘mass-empathy’ test 
of prediction is more likely to be tackled 
analytically than is the empathy test, as it 
does not lend itself so readily to empathizing 
with any particular person. Thus the empathy 
test will be regarded as nonanalytic, and the 
mass-empathy test as analytic . . .” (6,p.3). 

Taft’s distinction between the analytic and 
nonanalytic modes of response suggests fur- 
ther explanation for some of the correlations 
found in the present study. In the Dymond 
test, the rater or judge is forced to think 
about a particular individual with regard to 
a particular trait, especially in our study 
wherein all subjects knew each other quite 
well. In this situation, the factor of personal 
projection probably enters less and so we 
have the very high correlation between raw 
and refined empathy and the slight correla- 
tion between raw empathy and projection. On 
the other hand, in the mass approach, wherein 
the subject must make judgments about “most 
other people,” involving, as Taft says, “a proc- 


ess of inference,” the subject, without a frame 
of reference of a particular person, is quite 
likely to project his own feelings and atti- 
tudes into his answers. Hence the very poor 
r between raw and refined empathy and the 
very high r of .86 between raw empathy and 
projection. 

In any case, Hastorf and Bender’s criticism 
(3) is well taken, for the projection factor 
seems to enter significantly in both tests, to a 
much greater degree, to be sure, in the mass 
approach than in the individual approach. 


Summary and Conclusions 


Using 47 members of a social fraternity as 
subjects, two measures of empathy were cor- 
related. These were the Dymond Test, a 
measure of individual empathy, and a pro- 
cedure devised by Norman and Ainsworth for 
the measurement of mass empathy. In view of 
general criticism of these approaches in that 
they did not isolate the factor of projection, 
there were three scores derived from both 
measures—raw empathy, projection, and re- 
fined empathy. Correlations were about zero 
between the individual and mass empathy 
measures for all these variables, but signifi- 
cant correlations were obtained between these 
variables within the measures. Projection does 
enter as a factor in both tests, more seriously 
in the mass approach than in the individual 
approach, probably because there are more 
inferential judgments in the former than in 
the latter. 
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